Endianness, stack growth (GDB & Ghidra experiments)

Stack growth direction

a. Stack growth direction is independent of endianness, and is only dependent on the architecture

  • x86, x64 stack grows from higher to lower memory addresses

Endianess

a. The way the compiler interprets the values in the stack memory when it is read

b. Decides which byte is considered the least significant vs most significant

NOTE: Endianness only applies when bytes are interpreted as multi-byte numbers (integers, pointers, floats)

  • Strings will not be affected by endianess as they are are sequences of bytes, not numbers

Little endian

Given a hexadecimal/decimal, etc. representation:

  • The value at the leftmost side of the representation will form the MSB, while the rightmost will form the LSB

  • The value in the MSB will be placed at the high memory address, while the value at the LSB will be placed at the low memory address

Big endian

  • Opposite of the little endian format

Example (little endian)

The following displays a simple C function that does the following:

  1. Reads in a hex string as the 1st input

  • Converts string into the base-16 (hex) representation with the strtoul function

  • Prints the first 4 bytes in hexadecimal format

  1. Reads in another string as the 2nd input, and treats it as a raw bytes string

  • Prints the first 4 characters in hexadecimal format

Compile

  • The function shown below simply converts the string input into the base representation:

  • a. base 16 (hex): "1234" -> 0x1234

  • b. base 10 (decimal): "1234" -> 1234

  • c. base 8 (octal): "1234" -> 01234

We can use the following command on a Bash shell to provide the inputs to the C program earlier:

  1. Hexadecimal string: 0x12345678

  2. Raw hexadecimal input: b"\x12\x34\x56\x78"

  • Notice the difference between the way both input values are stored in memory

Note: the memory values shown are represented in hexadecimal format

Analysis with Ghidra

  • We can see that the first read() function writes to the local_19 variable, while the second writes to the local_1d variable

Analysis with GDB

The binary is compiled in a x86-64 architecture -> little endian

Experiment 1

I will attempt to understand the memory addresses written by each of the read() functions, and compare the addresses shown in the variables portion in Ghidra:

1st read function

  • The rdi register holds the value of the second argument to the read() function, which refers to the address in stack to store the input

2nd read function

Calculate the spacing between each addresses

  • As shown from Ghidra earlier, we know that the read functions writes to the local_19 and local_1d variables which are stored at Stack[-0x19] and Stack[-0x1d] respectively

  • Both the addresses shown in GDB and Ghidra has the same relative spacing value of 0x4

Experiment 2

In this experiment, I will attempt to understand how endianness affects the way data is written into the stack. We aim to compare how the hexadecimal sequence 0x12345678 is written into the stack when provided as input in 2 different formats:

a. Hexadecimal representation of an integer

b. Raw hexadecimal bytes

We can use the following simple Python script to create stack-exp-input.bin as the input to our program

  • 1st input: bytes string value of 12345678 -> converted to an integer with a hexadecimal representation of 0x12345678 in the C program later on

  • 2nd input: Raw bytes hexadecimal value of 0x12345678

1st input value

For the conversion of the 1st input, we will be using the strtoul() function indicated by the following line of assembly code@0x0000555555555237:

  • The value from the <strtoul@plt> function will not be written to the stack in this line, but instead saved to the eax register (return value from function). Hence, we need to set a breakpoint on the line after the mov operation (moves the value in eax to the location in stack)

  • Relevant line from Ghidra:

  • Value in the stack after running

Take note of the commented line where the local_2c variable is written

  • We can see that local_2c variable is written into the address@0x00007fffffffda9c

  • Notice that the value is written into the stack in the order: 0x78, 0x56, 0x34 and 0x12, instead of 0x12, 0x34, 0x56, 0x78 as what we expect from our hex-represented integer

Explanation

Given little-endian format, when we write 0x12345678

  • 0x78 is interpreted as the LSB, 0x12 is interpreted as the MSB

    • Hence, 0x78 will be stored into lower, while 0x12 into the higher memory address

2nd input value

  • Relevant line from Ghidra:

  • Value in the stack after running

Take note of the commented line where the local_1d variable is written

  • We can see that local_2c variable is written into the address@0x00007fffffffda9c

  • Notice that the value is written into the stack in the order: 0x12, 0x34, 0x56 and 0x78, exactly as what we have input

Explanation

When we write b"\x12\x34\x56\x78", the bytes (2 hex-digits) will be stored exactly as the sequence given

  • 0x12 stored into lower, while 0x78 stored into higher memory address

Last updated