====== Program Execution Details ======
===== Program Startup =====
Upon startup, the executable file is loaded into the main memory. There are multiple executable file formats available, e.g. Linux uses the Extensible Linking Format (ELF) while current versions of Microsoft Windows rely on the Portable Executable (PE) file format and MacOS X uses a format called Mach-O. As Linux is the primary target of this tutorial, the remaining startup description focuses on the execution of ELF files on Linux-based operating systems.
Knowing the address of the entry point ''_start'' of the executable from the ELF format, the operating system is able to start the execution. C developers might wonder why the execution entry point is called ''_start'' and not ''main()'', which is what they are used to. Although the runtime environment C programs have is minimal in comparison to other languages, its setup is done within each application's code. To save C programmers the effort of writing setup routines in every program by hand, compilers link ready-made code taking care of these tasks. This predefined code is called ''crt0'' and fills the functional gap between the raw execution entry point ''_start'' and the C entry point ''main()''(([[https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/i386/start.S|sourceware.org Git - glibc.git/blob - sysdeps/i386/start.S]]))(([[http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html|Linux x86 Program Start Up]])).
:!: Concluding from the reasoning above, command line arguments and environment variables are among the first values pushed to the stack of an application((Jeff Duntemann (2009). Assembly Language Step by Step: Programming with Linux
push ebp
mov ebp, esp
Remember that the stack grows from high memory addresses to low memory addresses. Allocating memory thus decreases the address of the top of the stack. By decreasing the value of the stack pointer ''ESP'' as shown below, ''n'' bytes of memory for local variables are allocated.
sub esp, n
At this point the stack setup is done and the actual content of the function is next to be executed. The base pointer can now be used to reference function parameters and local variables. While parameters have a positive offset, variables are referenced by a negative offset from ''EBP''. After the function execution is finished the stack pointer and base pointer are restored to their original values.
mov esp, ebp
pop ebp
After these instructions, the data of the function is not inside the range of the stack anymore.
Lastly, the return address is read from the stack and written to the instruction pointer register ''EIP'' by executing the ''ret'' instruction. It is not possible to directly assign a value to the instruction pointer register via a ''mov'' instruction.
ret
Execution continues with the code at the saved address. According to the ''cdecl'' calling convention the return value is placed in the ''EAX'' register. It is then the task of the caller to clean up the stack and remove the passed parameters((Bruce Dang; Alexandre Gazet; Elias Bachaalany; Sébastien Josse (2014). Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation)).
The information on the stack belonging to a particular function invocation is called the stack frame of the function((Jon Erickson (2008). Hacking: The Art of Exploitation
#ifndef SHARED_H
#define SHARED_H
void shared();
#endif
// gcc -shared -m32 -g -O0 -o libshared.so shared.c
#include
Next, a simple application using the shared library is compiled.
// gcc -m32 -L. -g -O0 call_shared.c -lshared
#include "shared.h"
int main()
{
shared();
return 0;
}
Note that the ''-l'' flag of GCC requires the libraries to have a ''lib'' prefix in their names. This is why the library was saved as "libshared.so".
The compilation succeeds without problems. Despite that, running the application yields the following error message.
$ ./a.out
./a.out: error while loading shared libraries: libshared.so: cannot open shared object file: No such file or directory
Inspecting the binary with ''ldd'' confirms that the shared library can not be resolved.
$ ldd a.out
linux-gate.so.1 (0xf7f0b000)
libshared.so => not found
libc.so.6 => /lib32/libc.so.6 (0xf7d0d000)
/lib/ld-linux.so.2 (0xf7f0d000)
As the shared library is resolved at runtime, it must be located within one of the configured search paths. Keeping the global configuration untouched, the user may extend these paths by specifying the ''LD_LIBRARY_PATH'' environment variable(([[http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html|Shared Libraries]])). The shared library is located in the current working directory, which is by default not part of this variable. It is sufficient to set the variable accordingly to make the application run.
$ LD_LIBRARY_PATH=. ./a.out
shared
Now the application runs and terminates correctly.
Although the call to the function in the shared library is syntactically equivalent to a normal function call, they differ in the way the function code is resolved. GDB is used to analyze the call in detail.
First, the application is loaded into the debugger.
$ LD_LIBRARY_PATH=. gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
Before starting the application, a breakpoint is set at the ''main()'' function.
(gdb) break main
Breakpoint 1 at 0x5e6: file call_shared.c, line 5.
After starting the application the breakpoint is hit. Another breakpoint is inserted directly at the call to the ''shared()'' function and at the symbol called ''shared@plt''.
(gdb) run
Starting program: /home/memory-corruption/a.out
Breakpoint 1, main () at call_shared.c:5
5 shared();
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x565555cd <+0>: lea ecx,[esp+0x4]
0x565555d1 <+4>: and esp,0xfffffff0
0x565555d4 <+7>: push DWORD PTR [ecx-0x4]
0x565555d7 <+10>: push ebp
0x565555d8 <+11>: mov ebp,esp
0x565555da <+13>: push ebx
0x565555db <+14>: push ecx
0x565555dc <+15>: call 0x565555f9 <__x86.get_pc_thunk.ax>
0x565555e1 <+20>: add eax,0x1a1f
=> 0x565555e6 <+25>: mov ebx,eax
0x565555e8 <+27>: call 0x56555470
Located directly at the function call, stepping to the next instruction reveals a jump.
(gdb) ni
Breakpoint 3, 0x56555470 in shared@plt ()
(gdb) x/i $eip
=> 0x56555470
This code is located in the Procedure Linkage Table (PLT) which is the first level of indirection. The underlying concept is more complex, but it can informally be considered a jump table for functions in dynamically linked libraries. At this point the description does not go more into detail, as it is not relevant for the topics covered by this tutorial. Remember that there is an indirection when calling a function located in a shared library and that the PLT contains jumps to these functions.
\\
----
[[.memory|← Back to memory types]] | [[..start|Overview]] | [[..exploitation:basic|Continue with buffer overflow basics →]] |