Endianness, stack growth (GDB & Ghidra experiments)

Stack growth direction

a. Stack growth direction is independent of endianness, and is only dependent on the architecture

x86, x64 stack grows from higher to lower memory addresses

Endianess

a. The way the compiler interprets the values in the stack memory when it is read

b. Decides which byte is considered the least significant vs most significant

NOTE: Endianness only applies when bytes are interpreted as multi-byte numbers (integers, pointers, floats)
Strings will not be affected by endianess as they are are sequences of bytes, not numbers

Little endian

Given a hexadecimal/decimal, etc. representation:

The value at the leftmost side of the representation will form the MSB, while the rightmost will form the LSB
The value in the MSB will be placed at the high memory address, while the value at the LSB will be placed at the low memory address

Big endian

Opposite of the little endian format

Example (little endian)

The following displays a simple C function that does the following:

Reads in a hex string as the 1st input

Converts string into the base-16 (hex) representation with the strtoul function
Prints the first 4 bytes in hexadecimal format

Reads in another string as the 2nd input, and treats it as a raw bytes string

Prints the first 4 characters in hexadecimal format

stack-exp.c

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>

int main(void) {
    char hexbuf[9] = {0};
    unsigned char raw[4];

    // read hex string
    read(0, hexbuf, 8);
    uint32_t num = strtoul(hexbuf, NULL, 16);

    printf("Bytes of 1st input in memory in hex format:\n");
    for (int i = 0; i < 4; i++)
        printf("%02x ", ((unsigned char *)&num)[i]);
    printf("\n");

    // read raw bytes
    read(0, raw, 4);

    printf("Bytes of 2nd input (raw bytes) in memory in hex format:\n");
    for (int i = 0; i < 4; i++)
        printf("%02x ", raw[i]);
    printf("\n");

    return 0;
}

Compile

$ gcc -o stack-exp stack-exp.c

The function shown below simply converts the string input into the base representation:
a. base 16 (hex): "1234" -> 0x1234
b. base 10 (decimal): "1234" -> 1234
c. base 8 (octal): "1234" -> 01234

uint32_t num = strtoul(hexbuf, NULL, 16);

We can use the following command on a Bash shell to provide the inputs to the C program earlier:

Hexadecimal string: 0x12345678
Raw hexadecimal input: b"\x12\x34\x56\x78"

$ python3 - << 'EOF' | ./stack-exp
import sys
sys.stdout.buffer.write(b"12345678")
sys.stdout.buffer.write(b"\x12\x34\x56\x78")
EOF

Bytes of 1st input in memory in hex format:
78 56 34 12
Bytes of 2nd input (raw bytes) in memory in hex format:
12 34 56 78

Notice the difference between the way both input values are stored in memory

Note: the memory values shown are represented in hexadecimal format

Analysis with Ghidra

We can see that the first read() function writes to the local_19 variable, while the second writes to the local_1d variable

Analysis with GDB

The binary is compiled in a x86-64 architecture -> little endian

$ gdb stack-exp

gdb> starti
gdb> disass main
Dump of assembler code for function main:
   0x00005555555551e9 <+0>:     endbr64
   0x00005555555551ed <+4>:     push   rbp
   0x00005555555551ee <+5>:     mov    rbp,rsp
   0x00005555555551f1 <+8>:     sub    rsp,0x30
   0x00005555555551f5 <+12>:    mov    rax,QWORD PTR fs:0x28
   0x00005555555551fe <+21>:    mov    QWORD PTR [rbp-0x8],rax
   0x0000555555555202 <+25>:    xor    eax,eax
   0x0000555555555204 <+27>:    mov    QWORD PTR [rbp-0x11],0x0
   0x000055555555520c <+35>:    mov    BYTE PTR [rbp-0x9],0x0
   0x0000555555555210 <+39>:    lea    rax,[rbp-0x11]
   0x0000555555555214 <+43>:    mov    edx,0x8
   0x0000555555555219 <+48>:    mov    rsi,rax
   0x000055555555521c <+51>:    mov    edi,0x0
   0x0000555555555221 <+56>:    call   0x5555555550e0 <read@plt>
   0x0000555555555226 <+61>:    lea    rax,[rbp-0x11]
   0x000055555555522a <+65>:    mov    edx,0x10
   0x000055555555522f <+70>:    mov    esi,0x0
   0x0000555555555234 <+75>:    mov    rdi,rax
   0x0000555555555237 <+78>:    call   0x5555555550f0 <strtoul@plt>
   0x000055555555523c <+83>:    mov    DWORD PTR [rbp-0x24],eax
   0x000055555555523f <+86>:    lea    rax,[rip+0xdc2]        # 0x555555556008
   0x0000555555555246 <+93>:    mov    rdi,rax
   0x0000555555555249 <+96>:    call   0x5555555550b0 <puts@plt>
   0x000055555555524e <+101>:   mov    DWORD PTR [rbp-0x20],0x0
   0x0000555555555255 <+108>:   jmp    0x555555555283 <main+154>
   0x0000555555555257 <+110>:   mov    eax,DWORD PTR [rbp-0x20]
   0x000055555555525a <+113>:   cdqe
   0x000055555555525c <+115>:   lea    rdx,[rbp-0x24]
   0x0000555555555260 <+119>:   add    rax,rdx
   0x0000555555555263 <+122>:   movzx  eax,BYTE PTR [rax]
   0x0000555555555266 <+125>:   movzx  eax,al
   0x0000555555555269 <+128>:   mov    esi,eax
   0x000055555555526b <+130>:   lea    rax,[rip+0xdc4]        # 0x555555556036
   0x0000555555555272 <+137>:   mov    rdi,rax
   0x0000555555555275 <+140>:   mov    eax,0x0
   0x000055555555527a <+145>:   call   0x5555555550d0 <printf@plt>
   0x000055555555527f <+150>:   add    DWORD PTR [rbp-0x20],0x1
   0x0000555555555283 <+154>:   cmp    DWORD PTR [rbp-0x20],0x3
   0x0000555555555287 <+158>:   jle    0x555555555257 <main+110>
   0x0000555555555289 <+160>:   mov    edi,0xa
   0x000055555555528e <+165>:   call   0x5555555550a0 <putchar@plt>
   0x0000555555555293 <+170>:   lea    rax,[rbp-0x15]
   0x0000555555555297 <+174>:   mov    edx,0x4
   0x000055555555529c <+179>:   mov    rsi,rax
   0x000055555555529f <+182>:   mov    edi,0x0
   0x00005555555552a4 <+187>:   call   0x5555555550e0 <read@plt>
   0x00005555555552a9 <+192>:   lea    rax,[rip+0xd90]        # 0x555555556040
   0x00005555555552b0 <+199>:   mov    rdi,rax
   0x00005555555552b3 <+202>:   call   0x5555555550b0 <puts@plt>
   0x00005555555552b8 <+207>:   mov    DWORD PTR [rbp-0x1c],0x0
   0x00005555555552bf <+214>:   jmp    0x5555555552e8 <main+255>
   0x00005555555552c1 <+216>:   mov    eax,DWORD PTR [rbp-0x1c]
   0x00005555555552c4 <+219>:   cdqe
   0x00005555555552c6 <+221>:   movzx  eax,BYTE PTR [rbp+rax*1-0x15]
   0x00005555555552cb <+226>:   movzx  eax,al
   0x00005555555552ce <+229>:   mov    esi,eax
   0x00005555555552d0 <+231>:   lea    rax,[rip+0xd5f]        # 0x555555556036
   0x00005555555552d7 <+238>:   mov    rdi,rax
   0x00005555555552da <+241>:   mov    eax,0x0
   0x00005555555552df <+246>:   call   0x5555555550d0 <printf@plt>
   0x00005555555552e4 <+251>:   add    DWORD PTR [rbp-0x1c],0x1
   0x00005555555552e8 <+255>:   cmp    DWORD PTR [rbp-0x1c],0x3
   0x00005555555552ec <+259>:   jle    0x5555555552c1 <main+216>
   0x00005555555552ee <+261>:   mov    edi,0xa
   0x00005555555552f3 <+266>:   call   0x5555555550a0 <putchar@plt>
   0x00005555555552f8 <+271>:   mov    eax,0x0
   0x00005555555552fd <+276>:   mov    rdx,QWORD PTR [rbp-0x8]
   0x0000555555555301 <+280>:   sub    rdx,QWORD PTR fs:0x28
   0x000055555555530a <+289>:   je     0x555555555311 <main+296>
   0x000055555555530c <+291>:   call   0x5555555550c0 <__stack_chk_fail@plt>
   0x0000555555555311 <+296>:   leave
   0x0000555555555312 <+297>:   ret
End of assembler dump.

Experiment 1

I will attempt to understand the memory addresses written by each of the read() functions, and compare the addresses shown in the variables portion in Ghidra:

gdb> start # load runtime addresses

1st read function


gdb> break *0x555555555221 # first read() function
gdb> run
gdb> info registers rdi # view the "rdi" register
rsi            0x7fffffffdaaf      140737488345775

The rdi register holds the value of the second argument to the read() function, which refers to the address in stack to store the input

gdb> delete breakpoints

2nd read function

gdb> break *0x00005555555552a4
gdb> run 
gdb> info registers rdi
rsi            0x7fffffffdaab      140737488345771

Calculate the spacing between each addresses

gdb> p/x 0x7fffffffdaaf - 0x7fffffffdaab
$x = 0x4

As shown from Ghidra earlier, we know that the read functions writes to the local_19 and local_1d variables which are stored at Stack[-0x19] and Stack[-0x1d] respectively

gdb> p/x 0x1d - 0x19
$x = 0x4

Both the addresses shown in GDB and Ghidra has the same relative spacing value of 0x4

Experiment 2

In this experiment, I will attempt to understand how endianness affects the way data is written into the stack. We aim to compare how the hexadecimal sequence 0x12345678 is written into the stack when provided as input in 2 different formats:

a. Hexadecimal representation of an integer

b. Raw hexadecimal bytes

We can use the following simple Python script to create stack-exp-input.bin as the input to our program

1st input: bytes string value of 12345678 -> converted to an integer with a hexadecimal representation of 0x12345678 in the C program later on
2nd input: Raw bytes hexadecimal value of 0x12345678

$ python3 - << 'EOF' > stack-exp-input.bin
import sys
sys.stdout.buffer.write(b"12345678")
sys.stdout.buffer.write(b"\x12\x34\x56\x78")
EOF

1st input value

For the conversion of the 1st input, we will be using the strtoul() function indicated by the following line of assembly code@0x0000555555555237:

0x0000555555555237 <+78>:    call   0x5555555550f0 <strtoul@plt>
0x000055555555523c <+83>:    mov    DWORD PTR [rbp-0x24],eax

The value from the <strtoul@plt> function will not be written to the stack in this line, but instead saved to the eax register (return value from function). Hence, we need to set a breakpoint on the line after the mov operation (moves the value in eax to the location in stack)
Relevant line from Ghidra:

uVar1 = strtoul(local_19,(char **)0x0,0x10); // line 24
local_2c = (undefined4)uVar1; // line 25: value written into "local_2c variable"

gdb> break *0x000055555555523c
gdb> run < stack-exp-input.bin

Value in the stack after running

Take note of the commented line where the local_2c variable is written

─────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffda90│+0x0000: 0x0000000000000000   ← $rsp
0x00007fffffffda98│+0x0008: 0x1234567800000000
0x00007fffffffdaa0│+0x0010: 0x0000000000000000
0x00007fffffffdaa8│+0x0018: 0x31007ffff7fe5af0
0x00007fffffffdab0│+0x0020: 0x0038373635343332 ("2345678"?)
0x00007fffffffdab8│+0x0028: 0xb0d75d6a627dda00
0x00007fffffffdac0│+0x0030: 0x00007fffffffdb60  →  0x00007fffffffdbc0  →  0x0000000000000000     ← $rbp
0x00007fffffffdac8│+0x0038: 0x00007ffff7c2a1ca  →  <__libc_start_call_main+007a> mov edi, eax

We can see that local_2c variable is written into the address@0x00007fffffffda9c

gdb> x/4xb 0x00007fffffffda9c
0x7fffffffda9c: 0x78    0x56    0x34    0x12

Notice that the value is written into the stack in the order: 0x78, 0x56, 0x34 and 0x12, instead of 0x12, 0x34, 0x56, 0x78 as what we expect from our hex-represented integer

Explanation

Given little-endian format, when we write 0x12345678

0x78 is interpreted as the LSB, 0x12 is interpreted as the MSB
- Hence, 0x78 will be stored into lower, while 0x12 into the higher memory address

2nd input value

Relevant line from Ghidra:

read(0,local_1d,4); // line 32

gdb> break *0x00005555555552a9 
gdb> run < stack-exp-input.bin

Value in the stack after running

Take note of the commented line where the local_1d variable is written

─────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffda90│+0x0000: 0x0000000000000000   ← $rsp
0x00007fffffffda98│+0x0008: 0x1234567800000000
0x00007fffffffdaa0│+0x0010: 0x0000000000000004
0x00007fffffffdaa8│+0x0018: 0x3178563412fe5af0
0x00007fffffffdab0│+0x0020: 0x0038373635343332 ("2345678"?)
0x00007fffffffdab8│+0x0028: 0x4685d039e1c30100
0x00007fffffffdac0│+0x0030: 0x00007fffffffdb60  →  0x00007fffffffdbc0  →  0x0000000000000000     ← $rbp
0x00007fffffffdac8│+0x0038: 0x00007ffff7c2a1ca  →  <__libc_start_call_main+007a> mov edi, eax

We can see that local_2c variable is written into the address@0x00007fffffffda9c

gdb>  x/4xb 0x00007fffffffdaab
0x7fffffffdaab: 0x12    0x34    0x56    0x78

Notice that the value is written into the stack in the order: 0x12, 0x34, 0x56 and 0x78, exactly as what we have input

Explanation

When we write b"\x12\x34\x56\x78", the bytes (2 hex-digits) will be stored exactly as the sequence given

0x12 stored into lower, while 0x78 stored into higher memory address

Previous__libc_start_main and _start NextAssembly

Last updated 1 month ago

hashtagStack growth direction

hashtagEndianess

hashtagLittle endian

hashtagBig endian

hashtagExample (little endian)

hashtagCompile

hashtagAnalysis with Ghidra

hashtagAnalysis with GDB

hashtagExperiment 1

hashtag1st read function

hashtag2nd read function

hashtagExperiment 2

hashtag1st input value

hashtag2nd input value