[------------------------------------------------------------------------]
  [-- Uninformed Research -- informative information for the uninformed. --]
  [------------------------------------------------------------------------]
  [-- Genre  : Development                                               --]
  [-- Name   : needle                                                    --]
  [-- Desc   : Linux x86 run-time process manipulation                   --]
  [-- Url    : http://www.uninformed.org/                                --]
  [-- Use    : EVILNESS                                                  --]
  [------------------------------------------------------------------------]
  [-- Author : skape (mmiller@hick.org)                                  --]
  [-- Date   : 01/19/2003                                                --]
  [------------------------------------------------------------------------]

  [-- Table of contents:                                                 --]

      1) Overview
         1.1) Topics
         1.2) Techniques
         1.3) Execution Diversion
      2) Memory Allocation
      3) Memory Management
      4) Library Injection
      5) Code Injection
         5.1) Forking
         5.2) Threading
         5.3) Function Trampolines
      6) Conclusion
      7) References

  [-- 1) Overview                                                        --]

      So, you want to be evil and modify the image of an executing 
      process?  Well, perhaps you've come to the right place.  This 
      document deals strictly with some methodologies used to to
      alter process images under Linux.  If you're curious about how
      to do something similar to the things listed in this document in 
      Windows, please read the ``References`` section.  

  [-- 1.1) Topics                                                        --]

      The following concepts will be discussed in this document as they
      relate to run-time process manipulation:

      * Memory Allocation
   
         The use of being able to allocate and deallocate memory in a 
         running process from another process has awesome power for
         such scenarios as execution diversion (the act of diverting
         a processes execution to your own code), data hiding (the act
         of hiding data in a process image), and even, in some cases
         allocating dynamic structures/strings for use within a process
         for its normal execution.  These aren't the only uses, but
         they're all I could think of right now :).  See the
         ``Memory Allocation`` section for details.

      * Memory Management

         The ability to copy arbitrary memory from one process to another
         at arbitrary addresses allows for flexible manipulation of
         a given processes memory image.  This can be applied to copy
         strings, functions, integers, everything.  See the ``Memory
         Management`` section for details.

      * Library Injection

         The ability to inject arbitrary shared objects into a process
         allows for getting at symbols that an executable would not 
         normally have as well as allowing an evil-doer such as yourself
         to inject arbitrary PIC that can reference symbols in
         an executable without getting in trouble.  This alone is
         extremely powerful.  See the ``Library Injection`` section for 
         details.

      * Code Injection

         Well, when you get down to it, you just want to execute code
         in a given process that you define and you want to control
         when it gets executed.  Lucky for you, this is possible AND
         just as powerful as you'd hoped.  This document will cover
         three types of code injection:

         1) Forking

            The act of causing a process to create a child image
            and execute arbitrary code.

         2) Threading

            The act of causing a process to create a thread
            that executes an arbitrary function.

         3) Function Trampolines

            The act of causing a call to a given function to 
            'trampoline' to arbitrary code and then 'jump' back to
            the original function.

  [-- 1.2) Techniques                                                    --]

      As of this document I'm aware of two plausible techniques for
      altering the image of an executing process:

      * ptrace
         
         Likely the most obvious technique, the ptrace (process trace) API
         allows for altering of memory, reading of memory, looking and 
         setting registers, as well as single-stepping through a process.  
         The application for these things as it pertains to this document
         should be obvious.  If not, or if you're curious, read the 
         ``References`` section for more details on ptrace.

      * /proc/[pid]/mem
   
         This technique is more limited in the amount of things it can
         do but is by no means something that should be cast aside.  
         With the ability to read/write a given process's image, one
         could easily modify the image to do ``Code Injection``.  Doing
         things like memory allocation, management, and library 
         injection via this method are quote a means harder but *NOT*
         impossible.  They would take a decent amount of hackery though.
         (Theoretical, not proven yet, by me at least.)

  [-- 1.3) Execution Diversion                                           --]

      In order to do most of the techniques in this document we need to
      divert the execution of a running process to code that we control.
      This presents a few problems off the bat.  Where can we safely put
      the code that we want executed?  How could we possibly change the
      course of execution?  How do we restore execution once our code
      has finished?  Well, thankfully, there are answers to these 
      questions, and they're pretty easy to answer.  Let's start with 
      the first one.

      * Where can we safely put the code that we want executed?

      Well to answer this question you need to have a slight 
      understanding of how the process is laid out and how the flow of
      execution goes.  The basic tools you need in your knowledge base
      are that executables have symbols, symbols map to vma's that are
      used to tell the vm where symbols should be located in memory.
      This is used not only for functions, but also for global variables.
      With that said, we can tell where code will be in an executable
      based off processing the ELF image associated with the process.
      Example:

         root@rd-linux:~# objdump --syms ./ownme | grep main
         08048450 g     F .text  00000082              main

      This tells us that main will be found at 0x08048450 when the 
      program is executing.  But what good does this do us?  A lot.
      Considering the main function is the 'gateway' to normal code
      execution, it's an excellent place to use as a dumping zone for
      arbitrary code.  There are some restrictions, however.  The code
      has some size restrictions.  Here's the preamble and some code
      from main in ./ownme:

         root@rd-linux:~# objdump --section=.text                      \
                  --start-address=0x08048450 --stop-address=0x080484d4 \
                  -d ./ownme 

         ./ownme:     file format elf32-i386
   
         Disassembly of section .text:
   
         08048450 <main>:
         8048450:       55                      push   %ebp
         8048451:       89 e5                   mov    %esp,%ebp
         8048453:       83 ec 08                sub    $0x8,%esp
         8048456:       90                      nop    
         8048457:       90                      nop    
         8048458:       90                      nop    
         ...
         80484d0:       c9                      leave
         80484d1:       c3                      ret

      Granted, main isn't always the entry point, but it's easy to find
      out what is by the e_entry attribute of the elf header.  Now, the
      reason I say main is a great place to use as a dump zone is because
      it holds code that will _never be accessed again_.  This is the key.
      There are lots of other places you could use as a dumpzone. For
      instance, if the application contains a large helper banner, you
      could put code over the help banner considering the banner wont be 
      printed ever again once the program is executing.  Use your
      imagination, you'll think of lots more.  'main' is the most
      generic method, since it's guaranteed in every application.

      Well, now we know where we can safely put code to be executed, but
      how do we actually execute it?

      * How could we possibly change the course of execution?
      
      In order to change the course of execution in a process you need
      some working knowledge of ptrace and how the vm traverses an 
      executable.  Assuming you have both, read on.  On x86 there
      is a vm register used to hold the vma of the NEXT instruction.
      Once an instruction finishes, the vm processes the instruction
      at eip (the vm register) and increments eip by the size of the
      current instruction.  There are some instructions, such as jmp
      and call which are themselves execution diversion functions
      that cause eip to be changed to the address specified in the 
      operand.  We use this same principal when it comes to changing 
      our course of execution to what we want.

      Now, let's say that we theoretically put some of our own code
      at 0x08048450 (the address of main above) using the functionality
      from the ``Memory Management`` section.  In order to have
      our code get executed (since it would normally never get executed)
      we use ptrace's PTRACE_SETREGS and PTRACE_GETREGS functionality.
      These two methods allow a third party process to obtain the 
      registers and set the registers of another process.  These 
      registers include eip.  In order to change the execution we 
      perform the following steps:

         1) call PTRACE_GETREGS to obtain the 'current' set of
            registers.

         2) set eip in the returned set of registers to
            0x08048450 (the address of our code).

         3) call PTRACE_SETREGS with our modified structure.

         4) continue the course of execution.

      We've now successfully caused our code to be executed, but there's
      a problem.  We injected a small chunk of code that we wanted
      to be run, but then we wanted the process to return to normal
      execution.  That brings us to the next question.

      * How do we restore execution once our code has finished?

      Glad you asked, because this is the most important part.  In order
      to restore execution we need a to modify our injected code just
      a bit in order to make it easy for us to restore execution.  We 
      do this by adding an instruction near the end:

         int $0x3

      This is on Linux (and Windows) to signal an exception or breakpoint
      to the active debugger.  In the case of Linux, it sends a SIGTRAP,
      which, if the process is being traced will be caught by wait().
      
      Okay, so we've modified our code and let's say it looks something 
      like this:

         nop
         nop
         nop
         nop
         nop
         nop

         mov $0x1, %eax
         int $0x3
         nop

      The code is setup with a 6 byte nop pad at the top to make our 
      changing of eip more cleaner (and safer) due to the way the vm
      reacts to our execution diversion.  The movement of 1 into
      eax is just an example of our arbitrary code.  The int $0x3
      alerts our attached debugger (ptrace) and the nop is for padding
      so we can see when we hit the end of our code.

      Okay, that's a lot of stuff.  Let's walk through our modified
      process of execution now.  This assumes you've already injected
      your code at main (0x08048450):

         1) call PTRACE_GETREGS to obtain the 'current' set of
            registers

         2) save these registers in another structure.  This is used
            for restoration.

         3) set eip in the returned set of registers to 
            0x08048450 (the address of our code).

         4) call PTRACE_SETREGS with the modified structure.

         5) continue execution, but watch for signals with the wait()
              function.  If the wait function returns a signal
            that is a stop signal:

            a) call PTRACE_GETREGS and get the current set of registers

            b) if eip is equal to the size of your injected code - 1
               (the location of the nop at the end), you know you've
               reached the end of your code.  go to step 6 at this 
               point.

            c) otherwise, continue executing.
   
         6) at this point your code has finished.  call PTRACE_SETREGS
            with the saved structure from step 2 and you're finished.
            you've successfully diverted and reverted execution.

      That was a mouthful, but it's very important that it's understood.
      All of the topics in this document emplore this underlying
      logic to perform their actions.  Each one has a 'stub' assembly
      function that gets injected into a process at main to be executed.
      This code is meant to be small due to the fact that there are 
      potential size issues.

      Oh, and another thing, you have full control over every register
      in this scenario because the registers are restored with
      PTRACE_SETREGS before the 'normal' execution continues.

  [-- 2) Memory allocation                                               --]       

      Memory allocation is one of the key features in this documented
      as all of the sub topics in Execution Diversion are dependant
      on its functionailty.  Memory allocation allows for dynamic
      memory allocation in another process (duh).  The most applicable
      scenario with regards to this document for such a thing are the
      storage of arbitrary code in memory without size limitations.
      This allows one to inject a very large function for execution
      without having fear that they will overrun into another function
      or harmful spot.

      Memory allocation is relatively simple, but understanding how to
      get from a to b requires a bit of explaining.  The first thing we
      need to do is figure out where malloc will be in a given process
      image so that we may call into it.  If we can figure that out
      we should be home free considering what we know from section 1.3.

      Realize that all these steps below can and are easily automated,
      but for sake of knowing, here they are:

         1) Where could malloc possibly be?  Well, let's see what
            our choices are:

            root@rd-linux:~# ldd ./ownme 
               libc.so.6 => /lib/libc.so.6 (0x40016000)
               /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

            root@rd-linux:~# objdump --dynamic-syms --section=.text \
                              /lib/libc.so.6 | grep malloc             
               0006df90  w   DF .text  00000235  GLIBC_2.0   malloc
            root@rd-linux:~# objdump --dynamic-syms --section=.text \
                              /lib/ld-linux.so.2 | grep malloc
               0000c8f0  w   DF .text  000000db  GLIBC_2.0   malloc

            Alright, so we've got malloc in both libc and ld-linux.  We
            could probably use either but what about programs that don't
            use libc?  In order to be the most flexible, we should use
            ld-linux.  This also has a positive side effect which is
            that every elf binary has an 'interpreter', and, it just
            so happens to ld-linux is that interpreter.

         2) Alright, so we know the vma of malloc is at 0x0000c8f0,
            but that doesn't exactly look like a valid vma.  That's 
            because it's not.  It's an offset.  The actual vma
            can be calculated by adding the base address from ldd
            for ld-linux (0x40000000) to the offset (0x0000c8f0)
            which, in turn produces the full vma 0x4000c8f0.  Now
            we know exactly where malloc is.

         3) Cool, so we know where malloc is, now all we need to do is
            divert execution to some code that calls it and revert back.
            We also need the return address from malloc though so we 
            know where our newly allocated buffer is at.  Fortunately,
            this is quite easy with PTRACE_GETREGS.  eax will hold the 
            return value (cdecl).  The code is pretty simple and,
            considering we control all the registers, we can use
            them to pass arguments, such as size, into our code
            at the time of diversion.  Here's some code that will,
            when diverted to with the correctly initialized registers,
            call malloc and interrupt into the debugger:

               nop              # nop pads
               nop
               nop
               nop
               nop
               nop

               push %ebx        # push the size to allocate onto the stack
               call *%eax       # call malloc
               add  $0x4, %esp  # restore the stack
               int  $0x3        # breakpoint
               nop

            The above code expects the 'size' parameter in ebx and the 
            address of malloc in eax.

         4) Alrighty, so now we've executed our code and we're ready
            to restore the process to normal execution, but wait,
            we need the address malloc returned.  We simply use
            PTRACE_GETREGS and save eax and we've successfully
            allocated memory in another process, and we have the
            address to prove it.

      The same steps above can be used for deallocating memory, simply
      s/malloc/free/g and you're set :).

  [-- 3) Memory management                                               --]

      I'm only going to briefly cover the concept of copying memory
      from one process to another as it's sort of out of the scope
      of this document.  If you're more curious, read about memgrep
      in the ``References`` section.

      Copying memory from one process to another simply entails the
      use of PTRACE_POKEDATA which allows for writing 4 bytes of data
      to a given address inside a process.  Not much more is needed
      to be known from that point on :).
  
  [-- 4) Library Injection                                               --]

      Library injection is very powerful when it comes to using 
      functionality inside a running process that it wasn't meant to
      be doing.  One of the more obvious applications is that of loading
      a personally developed shared object into a running executable.

      This one was fun to figure out, so I'll just kind of walk you
      through the process I took.

      First thing's first, we need to figure out how to load a library
      without the binary being linked to libdl.  libdl is what provides
      functions like dlopen(), dlsym(), and dlclose().  The problem is
      that executables don't link to this library by default.  That means
      we can't do our magic technique of figuring out where dlopen will
      be in memory because, well, it isn't guaranteed to be there.

      There's still hope though.  dl* functions are mainly just stubs
      that make calling the underlying API easier.  Kind of like how
      libc makes calling syscalls easier.  Since these are just wrappers,
      there have to be implementers, and indeed, there are.  Check this
      out:

         root@rd-linux:~# objdump --dynamic-syms /lib/libc.so.6 | \
                           grep _dl_ | egrep "open|close|sym"
         000f7d10 g    DF .text  000001ad  GLIBC_2.2   _dl_vsym
         000f6f10 g    DF .text  000006b8  GLIBC_2.0   _dl_close
         000f6d80 g    DF .text  00000190  GLIBC_2.0   _dl_open
         000f7c00 g    DF .text  0000010d  GLIBC_2.2   _dl_sym

      Well, isn't it our lucky day?  libc.so.6 has _dl_open, _dl_sym, and 
      _dl_close.  These look amazingly similar to their dl* wrappers.
      In fact, they're almost exactly the same.  Compare the prototypes:

         extern void *dlopen (const char *file, int mode);
         extern void *dlsym (void *handle, const char *name) 
         extern int dlclose (void *handle);

      To:

         void *_dl_open (const char *file, int mode, const void *caller);
         void *_dl_sym (void *handle, const char *name, void *who);
         void _dl_close (void *_map);

      Pretty much the same right?  Looks very promising.  So here's what
      we know as of now:

         * We know where the _dl_* symbols will be at in the processes
           virtual memory.  (We can calculate it the same way we did 
           malloc)
         * We know the prototypes.

      One thing we don't know is how the functions expect their arguments.
      One would think they'd be stack based, right?  Well, not so.  They 
      seem to use a variation of fastcall (like syscalls).  Here's a
      short dump of _dl_open:

         000f6d80 <.text+0xdde00> (_dl_open):
           f6d80:       55                      push   %ebp
           f6d81:       89 e5                   mov    %esp,%ebp
           f6d83:       83 ec 2c                sub    $0x2c,%esp
           f6d86:       57                      push   %edi
           f6d87:       56                      push   %esi
           f6d88:       53                      push   %ebx
           f6d89:       e8 00 00 00 00          call   0xf6d8e
           f6d8e:       5b                      pop    %ebx
           f6d8f:       81 c3 ba 10 02 00       add    $0x210ba,%ebx
           f6d95:       89 c7                   mov    %eax,%edi
           f6d97:       89 d6                   mov    %edx,%esi
           f6d99:       89 4d e4                mov    %ecx,0xffffffe4(%ebp)
           f6d9c:       f7 c6 03 00 00 00       test   $0x3,%esi
           f6da2:       75 1c                   jne    0xf6dc0
           f6da4:       83 c4 f4                add    $0xfffffff4,%esp

      Looks pretty normal for the most part right?  Well, up until 0xf6d95
      at least.  It's quite odd that it's referencing eax, edx, and ecx 
      which have not been initialized in the context of _dl_open, and then
      using them and operating on them later in the function.  Very strange 
      to say the least.  Unless, of course, the arguments are being passed
      in registers instead of via the stack.  Let's look at the source
      code for _dl_open.

      void *
      internal_function
      _dl_open (const char *file, int mode, const void *caller)
      {
           struct dl_open_args args;
           const char *objname;
           const char *errstring;
           int errcode;
         
           if ((mode & RTLD_BINDING_MASK) == 0)
                /* One of the flags must be set.  */
                _dl_signal_error (EINVAL, file, NULL, 
                  N_("invalid mode for dlopen()"));

         ....

      }

      Okay, so we see roughly the first thing it does is do a bitwise and 
      on the mode passed in to make sure it's valid.  It does the and 
      with 0x00000003 (RTLD_BINDING_MASK).  Do we see any bitwise ands 
      with 0x3 in the disasm?  We sure do.  At 0xf6d9c a bitwise and is 
      performed between $0x3 and esi.  So esi must be where our mode is 
      stored, right?  Yes.  Let's see where esi is set.  Looks like it 
      gets set at 0xf6d97 from edx.  Okay, so maybe edx originally 
      contained our mode.  Where does edx get set?  No where in _dl_open. 
      That means the mode must have been passed in a register, and not on 
      the stack.  

      If you do some more research, you determine that the arguments 
      are passed as such:

         eax = library name (ex: /lib/libc.so.6)
         ecx = caller (ex: ./ownme)
         edx = mode (ex: RTLD_NOW | 0x80000000)

      Alright, so we know how arguments are passed AND we know the address 
      to call when we want to load a library.  From this point things 
      should be pretty obvious.

      All one need do is allocate space for the library name and the 
      caller in the image using the ``Memory Allocation`` technique.  
      Then copy the library and image using the ``Memory Management``
      technique.  Then, finally, execute the stub code that loads the 
      library.  That code would look something like this:

         nop                  # nop pads
         nop
         nop
         nop
         nop
         nop
   
         call *%edi           # call _dl_open
         int  $0x3            # breakpoint
         nop

      This code expects the arguments to already be initialized in the 
      proper registers from what we determine above and it expects 
      _dl_open's vma to be in edi.

      Welp, we've successfully injected a shared object into another 
      processes image.  What you do from here is up to the desired 
      outcome.  Calling _dl_sym and _dl_close uses the same code as above,
      but their arguments are as follows:

      _dl_sym expects:

         eax = library handle opened by _dl_open
         edx = symbol name (ex: 'pthread_create')

      _dl_close expects:
   
         eax = library handle opened by _dl_open

  [-- 5) Code Injection                                                  --]

      I must say we're getting rather hardcore, we can allocate memory,
      copy memory and load shared objects into arbitrary processes.  
      What more could we possibly want?  How about some arbitrary, 
      controlled code execution that isn't limited by size?  Sounds
      spiffy!

  [-- 5.1) Forking                                                       --]

      Let's say we want to fork a child process inside the context of
      another process and have it execute an arbitrary function
      that we've allocated and stored in the processes memory image
      via the ``Memory Allocation`` and ``Memory Management`` methods.
      Doing the fork is as simple as writing up some code that will
      use ``Execution Diversion`` to fork the child and return control
      to the parent as if nothing happened.  An example of forking
      and executing a supplied function is as follows:

         nop                  # nop pads
         nop
         nop
         nop
         nop
         nop
   
         mov  $0x2, %eax      # fork syscall
         int  $0x80           # interrupt
         cmp  $0x00, %eax     # is the pid stored in eax 0? if so, 
                              # we're the child
         jne  fork_finished   # since eax wasn't zero, it means we're the 
                              # parent.  jmp to finished.
         push %ebx            # since we're the child, we push the start 
                              # addr
         call *%edi           # then we call the function
         mov  $0x1, %eax      # exit the child process
         int  $0x80           # interrupt
      fork_finished:
         int  $0x3            # we're the parent, we breakpoint.
         nop

      This code expects the following registers to be set:

         ebx = the argument to be passed to the function
         edi = the vma of the function call in the context of the child.
  
      Forking is really as simple as that.   Now, one side effect is that
      if the daemon does not expect fork children (ie, it doesn't call
      wait()) then your child process will show up as defunct when it
      exits due to not being cleaned up properly.  There are ways around    
      this, though.  You could use the ``Execution Diversion`` technique
      to perform cleanup of exitted children after for the process.
 
  [-- 5.2) Threading                                                     --]

      Similar to forking, but different by the fact that a thread runs
      in the context of the caller and shares memory, threading allows
      for pretty much the same things that forking does.  There are
      some risks with threading though.  For instance, it is _NOT_ safe
      to create a thread in a process that does not natural thread.  This
      is for multiple reasons -- the most important being that the 
      threading environment is setup at load time (in the case of 
      pthreads).  If Linux didn't use some ghetto application-level
      threading architecture, things wouldn't be so bad.  

      If you really do want to take the risk of creating a thread, 
      the process would be something like this:

         1) Inject libpthread.so into the process (``Library Injection``)
         2) Find pthread_create's vma in the process 
            (``Library Injection``)
         3) Allocate and copy user defined code (``Memory Allocation``)
         4) Perform ``Execution Diversion`` on the stub code to
            create the thread.  An example of such code is:

            nop                  # nop pads
            nop
            nop
            nop
            nop
            nop
         
            sub  $0x4, %esp      # space for the id
            mov  %esp, %ebp      # store esp in ebp for pushing
            push %ebx            # push argument
            push %eax            # push function
            push $0x0            # no attributes
            push %ebp            # push addr to store thread id in
            call *%edi           # call pthread_create
            add  $0x14, %esp     # restore stack
            int  $0x3            # breakpoint
            nop

      Like I said, threading is dangerous.  Know your program before
      attempting to inject a thread.  You will get odd results if
      you inject a thread into a process that doesn't naturally thread.

  [-- 5.3) Function Trampolines                                          --]

      Function trampolines are a great way to transparently hook arbitrary
      functions in memory.  I'll give a brief overview of what a function
      trampoline is and how it works.
    
      The basic jist to how function trampolines work is that they 
      overwrite the first x instructions where the size of the x 
      instructions is at least six bytes.  The six bytes come from the 
      fact that on x86 unconditional jumps take up 6 bytes in opcodes.
      The x instructions are replaced with the jmp instruction that 
      jumps to an address in memory that contains the injected function.
      This function runs before the actual function runs, and thus, has
      complete control over whether the actual function even gets called.
      At the end of the injected function the x instructions are appended
      as well as a jump back to the original function plus the size of
      the x instructions.  Here's an example:

      Let's say we want to hook the function 'testFunction' in the 
      executable 'ownme'.

         root@rd-linux:~# objdump -d ownme --start-addr=0x080484d4

         ownme:     file format elf32-i386

         Disassembly of section .init:
         Disassembly of section .plt:
         Disassembly of section .text:

         080484d4 <testFunction>:
           80484d4:       55                      push   %ebp
           80484d5:       89 e5                   mov    %esp,%ebp
           80484d7:       83 ec 18                sub    $0x18,%esp
          ...
           8048500:       c9                      leave  
           8048501:       c3                      ret    

      Well, it looks like the first 3 instructions match our criteria 
      of at least 6 bytes.  Let's keep those 6 bytes of opcodes
      tucked away for now.

      We need to be smart here.  We're going to do a jmp that
      says jmp to address stored in address x.  We're also
      going to want to restore back to the original place.  That means
      when we allocate our memory we should allocate it in a format like
      this:

         [ 4 bytes storing the address of our code                 ]
         [ 4 bytes storing the address to jmp back to              ]
         [ X bytes of arbitrary code                               ]
         [ X bytes containing the X instructions that we overwrote ]
         [ 6 bytes for the jump back                               ]

      So let's say we want to inject this code and we allocated
      a buffer in the process of the approriate length which starts
      at 0x41414140:

         nop
         movb $0x1, %al

      Our actual buffer in memory would look something like this

         0x41414140 = 0x41414148 (address of our code)
         0x41414144 = 0x080484d8 (address to jmp back to)
         0x41414148 = 3 bytes (nop, movb)
           0x4141414B = 6 bytes of preamble from testFunction
           0x41414152 = jmp *0x41414144
      
      The last step now that we have our code injected is to overwrite
      the actual preamble (the 6 bytes of testFunction) with the jmp
      to our code.  The assembly would look something like this:

         jmp *0x41414140  # Jump to the address stored in 0x41414140

      Once that's overwritten, we're home free.  The flow of 
      execution goes like this:

         1) Call to testFunction
         2) First instruction of testFunction is:
            jmp *0x41414140
         3) vm jumps to 0x41414148 an executes:
            nop
            movb $0x1, %al
            push %ebp
            mov  %esp, %ebp
            sub  $0x18, %esp
            jmp  *0x41414144
         4) vm jumps to 0x080484d8
         5) Function executes like normal.

      That's all there is to it.  There are a couple of restrictions
      when using trampolines:

         1) NEVER modify the stack without restoring it before
            the original functions preamble gets called.  Bad
            things will happen.
         2) Becareful what registers you modify.  Some functions
            may use fastcall.

      For more information on function trampolines, see the ``References``
      section.
  
  [-- 6) Conclusion                                                      --]

      That about wraps it up.  You now have the tools to allocate,
      copy, inject libraries, create forks, create threads, and
      install function trampolines.  You also have the underlying 
      concept of ``Execution Diversion`` which can be applied across
      the board to even more things I haven't even thought of yet.

  [-- 7) References                                                      --]

      * For information about ``Function Trampolines``:
   
         http://research.microsoft.com/sn/detours