[------------------------------------------------------------------------] [-- Uninformed Research -- informative information for the uninformed. --] [------------------------------------------------------------------------] [-- Genre : Development --] [-- Name : needle --] [-- Desc : Linux x86 run-time process manipulation --] [-- Url : http://www.uninformed.org/ --] [-- Use : EVILNESS --] [------------------------------------------------------------------------] [-- Author : skape (mmiller@hick.org) --] [-- Date : 01/19/2003 --] [------------------------------------------------------------------------] [-- Table of contents: --] 1) Overview 1.1) Topics 1.2) Techniques 1.3) Execution Diversion 2) Memory Allocation 3) Memory Management 4) Library Injection 5) Code Injection 5.1) Forking 5.2) Threading 5.3) Function Trampolines 6) Conclusion 7) References [-- 1) Overview --] So, you want to be evil and modify the image of an executing process? Well, perhaps you've come to the right place. This document deals strictly with some methodologies used to to alter process images under Linux. If you're curious about how to do something similar to the things listed in this document in Windows, please read the ``References`` section. [-- 1.1) Topics --] The following concepts will be discussed in this document as they relate to run-time process manipulation: * Memory Allocation The use of being able to allocate and deallocate memory in a running process from another process has awesome power for such scenarios as execution diversion (the act of diverting a processes execution to your own code), data hiding (the act of hiding data in a process image), and even, in some cases allocating dynamic structures/strings for use within a process for its normal execution. These aren't the only uses, but they're all I could think of right now :). See the ``Memory Allocation`` section for details. * Memory Management The ability to copy arbitrary memory from one process to another at arbitrary addresses allows for flexible manipulation of a given processes memory image. This can be applied to copy strings, functions, integers, everything. See the ``Memory Management`` section for details. * Library Injection The ability to inject arbitrary shared objects into a process allows for getting at symbols that an executable would not normally have as well as allowing an evil-doer such as yourself to inject arbitrary PIC that can reference symbols in an executable without getting in trouble. This alone is extremely powerful. See the ``Library Injection`` section for details. * Code Injection Well, when you get down to it, you just want to execute code in a given process that you define and you want to control when it gets executed. Lucky for you, this is possible AND just as powerful as you'd hoped. This document will cover three types of code injection: 1) Forking The act of causing a process to create a child image and execute arbitrary code. 2) Threading The act of causing a process to create a thread that executes an arbitrary function. 3) Function Trampolines The act of causing a call to a given function to 'trampoline' to arbitrary code and then 'jump' back to the original function. [-- 1.2) Techniques --] As of this document I'm aware of two plausible techniques for altering the image of an executing process: * ptrace Likely the most obvious technique, the ptrace (process trace) API allows for altering of memory, reading of memory, looking and setting registers, as well as single-stepping through a process. The application for these things as it pertains to this document should be obvious. If not, or if you're curious, read the ``References`` section for more details on ptrace. * /proc/[pid]/mem This technique is more limited in the amount of things it can do but is by no means something that should be cast aside. With the ability to read/write a given process's image, one could easily modify the image to do ``Code Injection``. Doing things like memory allocation, management, and library injection via this method are quote a means harder but *NOT* impossible. They would take a decent amount of hackery though. (Theoretical, not proven yet, by me at least.) [-- 1.3) Execution Diversion --] In order to do most of the techniques in this document we need to divert the execution of a running process to code that we control. This presents a few problems off the bat. Where can we safely put the code that we want executed? How could we possibly change the course of execution? How do we restore execution once our code has finished? Well, thankfully, there are answers to these questions, and they're pretty easy to answer. Let's start with the first one. * Where can we safely put the code that we want executed? Well to answer this question you need to have a slight understanding of how the process is laid out and how the flow of execution goes. The basic tools you need in your knowledge base are that executables have symbols, symbols map to vma's that are used to tell the vm where symbols should be located in memory. This is used not only for functions, but also for global variables. With that said, we can tell where code will be in an executable based off processing the ELF image associated with the process. Example: root@rd-linux:~# objdump --syms ./ownme | grep main 08048450 g F .text 00000082 main This tells us that main will be found at 0x08048450 when the program is executing. But what good does this do us? A lot. Considering the main function is the 'gateway' to normal code execution, it's an excellent place to use as a dumping zone for arbitrary code. There are some restrictions, however. The code has some size restrictions. Here's the preamble and some code from main in ./ownme: root@rd-linux:~# objdump --section=.text \ --start-address=0x08048450 --stop-address=0x080484d4 \ -d ./ownme ./ownme: file format elf32-i386 Disassembly of section .text: 08048450
: 8048450: 55 push %ebp 8048451: 89 e5 mov %esp,%ebp 8048453: 83 ec 08 sub $0x8,%esp 8048456: 90 nop 8048457: 90 nop 8048458: 90 nop ... 80484d0: c9 leave 80484d1: c3 ret Granted, main isn't always the entry point, but it's easy to find out what is by the e_entry attribute of the elf header. Now, the reason I say main is a great place to use as a dump zone is because it holds code that will _never be accessed again_. This is the key. There are lots of other places you could use as a dumpzone. For instance, if the application contains a large helper banner, you could put code over the help banner considering the banner wont be printed ever again once the program is executing. Use your imagination, you'll think of lots more. 'main' is the most generic method, since it's guaranteed in every application. Well, now we know where we can safely put code to be executed, but how do we actually execute it? * How could we possibly change the course of execution? In order to change the course of execution in a process you need some working knowledge of ptrace and how the vm traverses an executable. Assuming you have both, read on. On x86 there is a vm register used to hold the vma of the NEXT instruction. Once an instruction finishes, the vm processes the instruction at eip (the vm register) and increments eip by the size of the current instruction. There are some instructions, such as jmp and call which are themselves execution diversion functions that cause eip to be changed to the address specified in the operand. We use this same principal when it comes to changing our course of execution to what we want. Now, let's say that we theoretically put some of our own code at 0x08048450 (the address of main above) using the functionality from the ``Memory Management`` section. In order to have our code get executed (since it would normally never get executed) we use ptrace's PTRACE_SETREGS and PTRACE_GETREGS functionality. These two methods allow a third party process to obtain the registers and set the registers of another process. These registers include eip. In order to change the execution we perform the following steps: 1) call PTRACE_GETREGS to obtain the 'current' set of registers. 2) set eip in the returned set of registers to 0x08048450 (the address of our code). 3) call PTRACE_SETREGS with our modified structure. 4) continue the course of execution. We've now successfully caused our code to be executed, but there's a problem. We injected a small chunk of code that we wanted to be run, but then we wanted the process to return to normal execution. That brings us to the next question. * How do we restore execution once our code has finished? Glad you asked, because this is the most important part. In order to restore execution we need a to modify our injected code just a bit in order to make it easy for us to restore execution. We do this by adding an instruction near the end: int $0x3 This is on Linux (and Windows) to signal an exception or breakpoint to the active debugger. In the case of Linux, it sends a SIGTRAP, which, if the process is being traced will be caught by wait(). Okay, so we've modified our code and let's say it looks something like this: nop nop nop nop nop nop mov $0x1, %eax int $0x3 nop The code is setup with a 6 byte nop pad at the top to make our changing of eip more cleaner (and safer) due to the way the vm reacts to our execution diversion. The movement of 1 into eax is just an example of our arbitrary code. The int $0x3 alerts our attached debugger (ptrace) and the nop is for padding so we can see when we hit the end of our code. Okay, that's a lot of stuff. Let's walk through our modified process of execution now. This assumes you've already injected your code at main (0x08048450): 1) call PTRACE_GETREGS to obtain the 'current' set of registers 2) save these registers in another structure. This is used for restoration. 3) set eip in the returned set of registers to 0x08048450 (the address of our code). 4) call PTRACE_SETREGS with the modified structure. 5) continue execution, but watch for signals with the wait() function. If the wait function returns a signal that is a stop signal: a) call PTRACE_GETREGS and get the current set of registers b) if eip is equal to the size of your injected code - 1 (the location of the nop at the end), you know you've reached the end of your code. go to step 6 at this point. c) otherwise, continue executing. 6) at this point your code has finished. call PTRACE_SETREGS with the saved structure from step 2 and you're finished. you've successfully diverted and reverted execution. That was a mouthful, but it's very important that it's understood. All of the topics in this document emplore this underlying logic to perform their actions. Each one has a 'stub' assembly function that gets injected into a process at main to be executed. This code is meant to be small due to the fact that there are potential size issues. Oh, and another thing, you have full control over every register in this scenario because the registers are restored with PTRACE_SETREGS before the 'normal' execution continues. [-- 2) Memory allocation --] Memory allocation is one of the key features in this documented as all of the sub topics in Execution Diversion are dependant on its functionailty. Memory allocation allows for dynamic memory allocation in another process (duh). The most applicable scenario with regards to this document for such a thing are the storage of arbitrary code in memory without size limitations. This allows one to inject a very large function for execution without having fear that they will overrun into another function or harmful spot. Memory allocation is relatively simple, but understanding how to get from a to b requires a bit of explaining. The first thing we need to do is figure out where malloc will be in a given process image so that we may call into it. If we can figure that out we should be home free considering what we know from section 1.3. Realize that all these steps below can and are easily automated, but for sake of knowing, here they are: 1) Where could malloc possibly be? Well, let's see what our choices are: root@rd-linux:~# ldd ./ownme libc.so.6 => /lib/libc.so.6 (0x40016000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) root@rd-linux:~# objdump --dynamic-syms --section=.text \ /lib/libc.so.6 | grep malloc 0006df90 w DF .text 00000235 GLIBC_2.0 malloc root@rd-linux:~# objdump --dynamic-syms --section=.text \ /lib/ld-linux.so.2 | grep malloc 0000c8f0 w DF .text 000000db GLIBC_2.0 malloc Alright, so we've got malloc in both libc and ld-linux. We could probably use either but what about programs that don't use libc? In order to be the most flexible, we should use ld-linux. This also has a positive side effect which is that every elf binary has an 'interpreter', and, it just so happens to ld-linux is that interpreter. 2) Alright, so we know the vma of malloc is at 0x0000c8f0, but that doesn't exactly look like a valid vma. That's because it's not. It's an offset. The actual vma can be calculated by adding the base address from ldd for ld-linux (0x40000000) to the offset (0x0000c8f0) which, in turn produces the full vma 0x4000c8f0. Now we know exactly where malloc is. 3) Cool, so we know where malloc is, now all we need to do is divert execution to some code that calls it and revert back. We also need the return address from malloc though so we know where our newly allocated buffer is at. Fortunately, this is quite easy with PTRACE_GETREGS. eax will hold the return value (cdecl). The code is pretty simple and, considering we control all the registers, we can use them to pass arguments, such as size, into our code at the time of diversion. Here's some code that will, when diverted to with the correctly initialized registers, call malloc and interrupt into the debugger: nop # nop pads nop nop nop nop nop push %ebx # push the size to allocate onto the stack call *%eax # call malloc add $0x4, %esp # restore the stack int $0x3 # breakpoint nop The above code expects the 'size' parameter in ebx and the address of malloc in eax. 4) Alrighty, so now we've executed our code and we're ready to restore the process to normal execution, but wait, we need the address malloc returned. We simply use PTRACE_GETREGS and save eax and we've successfully allocated memory in another process, and we have the address to prove it. The same steps above can be used for deallocating memory, simply s/malloc/free/g and you're set :). [-- 3) Memory management --] I'm only going to briefly cover the concept of copying memory from one process to another as it's sort of out of the scope of this document. If you're more curious, read about memgrep in the ``References`` section. Copying memory from one process to another simply entails the use of PTRACE_POKEDATA which allows for writing 4 bytes of data to a given address inside a process. Not much more is needed to be known from that point on :). [-- 4) Library Injection --] Library injection is very powerful when it comes to using functionality inside a running process that it wasn't meant to be doing. One of the more obvious applications is that of loading a personally developed shared object into a running executable. This one was fun to figure out, so I'll just kind of walk you through the process I took. First thing's first, we need to figure out how to load a library without the binary being linked to libdl. libdl is what provides functions like dlopen(), dlsym(), and dlclose(). The problem is that executables don't link to this library by default. That means we can't do our magic technique of figuring out where dlopen will be in memory because, well, it isn't guaranteed to be there. There's still hope though. dl* functions are mainly just stubs that make calling the underlying API easier. Kind of like how libc makes calling syscalls easier. Since these are just wrappers, there have to be implementers, and indeed, there are. Check this out: root@rd-linux:~# objdump --dynamic-syms /lib/libc.so.6 | \ grep _dl_ | egrep "open|close|sym" 000f7d10 g DF .text 000001ad GLIBC_2.2 _dl_vsym 000f6f10 g DF .text 000006b8 GLIBC_2.0 _dl_close 000f6d80 g DF .text 00000190 GLIBC_2.0 _dl_open 000f7c00 g DF .text 0000010d GLIBC_2.2 _dl_sym Well, isn't it our lucky day? libc.so.6 has _dl_open, _dl_sym, and _dl_close. These look amazingly similar to their dl* wrappers. In fact, they're almost exactly the same. Compare the prototypes: extern void *dlopen (const char *file, int mode); extern void *dlsym (void *handle, const char *name) extern int dlclose (void *handle); To: void *_dl_open (const char *file, int mode, const void *caller); void *_dl_sym (void *handle, const char *name, void *who); void _dl_close (void *_map); Pretty much the same right? Looks very promising. So here's what we know as of now: * We know where the _dl_* symbols will be at in the processes virtual memory. (We can calculate it the same way we did malloc) * We know the prototypes. One thing we don't know is how the functions expect their arguments. One would think they'd be stack based, right? Well, not so. They seem to use a variation of fastcall (like syscalls). Here's a short dump of _dl_open: 000f6d80 <.text+0xdde00> (_dl_open): f6d80: 55 push %ebp f6d81: 89 e5 mov %esp,%ebp f6d83: 83 ec 2c sub $0x2c,%esp f6d86: 57 push %edi f6d87: 56 push %esi f6d88: 53 push %ebx f6d89: e8 00 00 00 00 call 0xf6d8e f6d8e: 5b pop %ebx f6d8f: 81 c3 ba 10 02 00 add $0x210ba,%ebx f6d95: 89 c7 mov %eax,%edi f6d97: 89 d6 mov %edx,%esi f6d99: 89 4d e4 mov %ecx,0xffffffe4(%ebp) f6d9c: f7 c6 03 00 00 00 test $0x3,%esi f6da2: 75 1c jne 0xf6dc0 f6da4: 83 c4 f4 add $0xfffffff4,%esp Looks pretty normal for the most part right? Well, up until 0xf6d95 at least. It's quite odd that it's referencing eax, edx, and ecx which have not been initialized in the context of _dl_open, and then using them and operating on them later in the function. Very strange to say the least. Unless, of course, the arguments are being passed in registers instead of via the stack. Let's look at the source code for _dl_open. void * internal_function _dl_open (const char *file, int mode, const void *caller) { struct dl_open_args args; const char *objname; const char *errstring; int errcode; if ((mode & RTLD_BINDING_MASK) == 0) /* One of the flags must be set. */ _dl_signal_error (EINVAL, file, NULL, N_("invalid mode for dlopen()")); .... } Okay, so we see roughly the first thing it does is do a bitwise and on the mode passed in to make sure it's valid. It does the and with 0x00000003 (RTLD_BINDING_MASK). Do we see any bitwise ands with 0x3 in the disasm? We sure do. At 0xf6d9c a bitwise and is performed between $0x3 and esi. So esi must be where our mode is stored, right? Yes. Let's see where esi is set. Looks like it gets set at 0xf6d97 from edx. Okay, so maybe edx originally contained our mode. Where does edx get set? No where in _dl_open. That means the mode must have been passed in a register, and not on the stack. If you do some more research, you determine that the arguments are passed as such: eax = library name (ex: /lib/libc.so.6) ecx = caller (ex: ./ownme) edx = mode (ex: RTLD_NOW | 0x80000000) Alright, so we know how arguments are passed AND we know the address to call when we want to load a library. From this point things should be pretty obvious. All one need do is allocate space for the library name and the caller in the image using the ``Memory Allocation`` technique. Then copy the library and image using the ``Memory Management`` technique. Then, finally, execute the stub code that loads the library. That code would look something like this: nop # nop pads nop nop nop nop nop call *%edi # call _dl_open int $0x3 # breakpoint nop This code expects the arguments to already be initialized in the proper registers from what we determine above and it expects _dl_open's vma to be in edi. Welp, we've successfully injected a shared object into another processes image. What you do from here is up to the desired outcome. Calling _dl_sym and _dl_close uses the same code as above, but their arguments are as follows: _dl_sym expects: eax = library handle opened by _dl_open edx = symbol name (ex: 'pthread_create') _dl_close expects: eax = library handle opened by _dl_open [-- 5) Code Injection --] I must say we're getting rather hardcore, we can allocate memory, copy memory and load shared objects into arbitrary processes. What more could we possibly want? How about some arbitrary, controlled code execution that isn't limited by size? Sounds spiffy! [-- 5.1) Forking --] Let's say we want to fork a child process inside the context of another process and have it execute an arbitrary function that we've allocated and stored in the processes memory image via the ``Memory Allocation`` and ``Memory Management`` methods. Doing the fork is as simple as writing up some code that will use ``Execution Diversion`` to fork the child and return control to the parent as if nothing happened. An example of forking and executing a supplied function is as follows: nop # nop pads nop nop nop nop nop mov $0x2, %eax # fork syscall int $0x80 # interrupt cmp $0x00, %eax # is the pid stored in eax 0? if so, # we're the child jne fork_finished # since eax wasn't zero, it means we're the # parent. jmp to finished. push %ebx # since we're the child, we push the start # addr call *%edi # then we call the function mov $0x1, %eax # exit the child process int $0x80 # interrupt fork_finished: int $0x3 # we're the parent, we breakpoint. nop This code expects the following registers to be set: ebx = the argument to be passed to the function edi = the vma of the function call in the context of the child. Forking is really as simple as that. Now, one side effect is that if the daemon does not expect fork children (ie, it doesn't call wait()) then your child process will show up as defunct when it exits due to not being cleaned up properly. There are ways around this, though. You could use the ``Execution Diversion`` technique to perform cleanup of exitted children after for the process. [-- 5.2) Threading --] Similar to forking, but different by the fact that a thread runs in the context of the caller and shares memory, threading allows for pretty much the same things that forking does. There are some risks with threading though. For instance, it is _NOT_ safe to create a thread in a process that does not natural thread. This is for multiple reasons -- the most important being that the threading environment is setup at load time (in the case of pthreads). If Linux didn't use some ghetto application-level threading architecture, things wouldn't be so bad. If you really do want to take the risk of creating a thread, the process would be something like this: 1) Inject libpthread.so into the process (``Library Injection``) 2) Find pthread_create's vma in the process (``Library Injection``) 3) Allocate and copy user defined code (``Memory Allocation``) 4) Perform ``Execution Diversion`` on the stub code to create the thread. An example of such code is: nop # nop pads nop nop nop nop nop sub $0x4, %esp # space for the id mov %esp, %ebp # store esp in ebp for pushing push %ebx # push argument push %eax # push function push $0x0 # no attributes push %ebp # push addr to store thread id in call *%edi # call pthread_create add $0x14, %esp # restore stack int $0x3 # breakpoint nop Like I said, threading is dangerous. Know your program before attempting to inject a thread. You will get odd results if you inject a thread into a process that doesn't naturally thread. [-- 5.3) Function Trampolines --] Function trampolines are a great way to transparently hook arbitrary functions in memory. I'll give a brief overview of what a function trampoline is and how it works. The basic jist to how function trampolines work is that they overwrite the first x instructions where the size of the x instructions is at least six bytes. The six bytes come from the fact that on x86 unconditional jumps take up 6 bytes in opcodes. The x instructions are replaced with the jmp instruction that jumps to an address in memory that contains the injected function. This function runs before the actual function runs, and thus, has complete control over whether the actual function even gets called. At the end of the injected function the x instructions are appended as well as a jump back to the original function plus the size of the x instructions. Here's an example: Let's say we want to hook the function 'testFunction' in the executable 'ownme'. root@rd-linux:~# objdump -d ownme --start-addr=0x080484d4 ownme: file format elf32-i386 Disassembly of section .init: Disassembly of section .plt: Disassembly of section .text: 080484d4 : 80484d4: 55 push %ebp 80484d5: 89 e5 mov %esp,%ebp 80484d7: 83 ec 18 sub $0x18,%esp ... 8048500: c9 leave 8048501: c3 ret Well, it looks like the first 3 instructions match our criteria of at least 6 bytes. Let's keep those 6 bytes of opcodes tucked away for now. We need to be smart here. We're going to do a jmp that says jmp to address stored in address x. We're also going to want to restore back to the original place. That means when we allocate our memory we should allocate it in a format like this: [ 4 bytes storing the address of our code ] [ 4 bytes storing the address to jmp back to ] [ X bytes of arbitrary code ] [ X bytes containing the X instructions that we overwrote ] [ 6 bytes for the jump back ] So let's say we want to inject this code and we allocated a buffer in the process of the approriate length which starts at 0x41414140: nop movb $0x1, %al Our actual buffer in memory would look something like this 0x41414140 = 0x41414148 (address of our code) 0x41414144 = 0x080484d8 (address to jmp back to) 0x41414148 = 3 bytes (nop, movb) 0x4141414B = 6 bytes of preamble from testFunction 0x41414152 = jmp *0x41414144 The last step now that we have our code injected is to overwrite the actual preamble (the 6 bytes of testFunction) with the jmp to our code. The assembly would look something like this: jmp *0x41414140 # Jump to the address stored in 0x41414140 Once that's overwritten, we're home free. The flow of execution goes like this: 1) Call to testFunction 2) First instruction of testFunction is: jmp *0x41414140 3) vm jumps to 0x41414148 an executes: nop movb $0x1, %al push %ebp mov %esp, %ebp sub $0x18, %esp jmp *0x41414144 4) vm jumps to 0x080484d8 5) Function executes like normal. That's all there is to it. There are a couple of restrictions when using trampolines: 1) NEVER modify the stack without restoring it before the original functions preamble gets called. Bad things will happen. 2) Becareful what registers you modify. Some functions may use fastcall. For more information on function trampolines, see the ``References`` section. [-- 6) Conclusion --] That about wraps it up. You now have the tools to allocate, copy, inject libraries, create forks, create threads, and install function trampolines. You also have the underlying concept of ``Execution Diversion`` which can be applied across the board to even more things I haven't even thought of yet. [-- 7) References --] * For information about ``Function Trampolines``: http://research.microsoft.com/sn/detours