/RE-Essentials: Disassemble that Binary


  

  

Okay, since youre here, this is not a workshop for beginners or something, its more of organizing the thought process to actually be able to reach a certain level of knowledge in RE and being comfortable in it. Iam not going all over the basics of X86, Assembly and C code constructs, although at some point I will publish a dedicated explanation of certain topics, but for now this could be considered a refresher, and along the way I will put references for better resources to actually practice something if you dont know it, or get a deeper look/understanding on how it works.

 

 

This Episode's Flow:

 [+] Life of a binary before disassemblers and debuggers

 [+] Remember, Remember, X86 | X86-64

 [+] There is more to it than what meets the eye

 

 

         

[+]Life of a binary before disassemblers and debuggers
 

We'll start by discussing how a Program is layed out in memory, from processing different parts of the program till executing it.


First, your intent and action to double click and run an application makes the OS warm up and get a copy of that program and allocate some memory for it and places this copy aka _Program image_ in RAM; as we know RAM is where active programs are placed, and by active i mean a programs AT WORK.

 

Second, now the program is in the hands of RAM or let's say Main Memory, RAM also warms up and gives some attention to the newly arrived program and tidy up an _Address Space_ for the program's routines to place and execute.

 

This is where we find the 3 segments .text, .code, .bss; if you have practiced writing some assembly _in any dialect_, you have to define these main 3 segments; the system uses these segments to organize the program in RAM:

 

->.text: is where the machine instructions aka Code is put, this place is where the IP _instruction pointer_ jumps to suddenly in the middle of execution, to execute another piece of code and return back to its normal flow. .text is readable/executable only. you cannot write into this segment; you no longer can.
 

->.data: where your initialized global/static variables reside. .data is writable and changes at runtime

->.bss: where uninitialized global/static variables are located. all uninitialized variables are initialized to Zero.

/* Simple C program to show the differrence between .bss and .data */

int global ;             // an uninitialized global variable stored in .bss
int global_one = 10 ;       // an initialized global vairable stored in .data

int main (void)
{
    static int i = 50  ;      //  an initialized static variable stored in .data
    static int j ;           //  an uninitialized static variable store in .bss

    return 0 ;
}

 

RAM will then initialize The Stack, The Heap;

->The Stack: Data Structure that is perfect for temporary data storage, where you can PUSH data for manipulation _Calculations ..etc_ and POP it when youre done. Designed in a LIFO manner specifically for Program Flow _for routines to call other subroutines_. The Stack is marked non-executable/ Write-Only RW, meaning that if the IP _Instruction Pointer_ points to somewhere on the stack, you will get the _youre trying to execute a non-executable memory_ kinda error. recall the .text segment? this is where executable code should be located and IP fetches instructions from there.
 

->The Heap: Less Structured Memory adjoined to The Stack and grows to the opposite direction, this is where C functions like malloc, free ..etc work. You can freely allocate memory and free them dynamically in the Heap. The Heap is readable, writeable e.i. dynamic memory structure.

 

Now your program is laid perfectly in RAM.

 

Long story short; the stack grows to location Zero in memory, and by PUSHing data in, you decrement the stack by the data size. The Stack grows down to lower addresses, grows up to lower addresses, the stack shrinks or grows; these are all mental visualizations to describe what is going on.

 

 

Basic-Memory-Mapping

[user space part of the main memory]

 

This is the very basic Program Layout in memory, this is for a single Program Image, there are many of these inside the memory.
With more places for the DLLs associated with the program being loaded and space for Kernel-Land, you can pretty much say this is all of it. now there is a resaon for the neighboring of The Stack and The Heap in this manner, but we will not get into this here.

 

 

Third, Magic..

 

 

         

[+]Remember, Remember, X86 | X86-64
 

I cannot stress this enough, this is important, if we want to hack into binaries, know why our binary was hacked, we need to know what is a binary, I will go through this topic and review it in later episodes until we really get comfortable with it. now if you cannot care less;

 

[The Shellcoders Handbook 2nd Edition]

 

 

X86 | X86-64 Architecture , Registers and Data Types:

 

X86 architecture can operate on two modes:

→ Real mode: when its first powered and uses only 16bit instruction set.

→ Protected mode: mode at which modern computers operate. Its the processor state at which VM ( Virtual Memory / Paging ... etc) is supported.

 

X64 / X86-64 Extended X86 arch that supports 64bit instrutions set, with variable length instructions, for more on Instruction Sets and different Computer Architectures go here .

 

Now we need to go over some important stuff about X86 / X64, I suggest you go over the basics with all the types of instructions ..etc using this resource , now this takes time, and as i said we need to Remember how important is it.

 

         

Registers:

Basic CPU storage units, mainly to save time for the CPU so it doesn't need to access RAM, with GPR _General Pupose Registers_, Control Registers, Segment Registers and EFLAGS.
X86 has 8 GPRs, some of them can be further divided into 16bits, 8bits registers.

 

GPRs are often used for specific operations, but you can use them as you want, here are some convenions:

AX -> Accumulator Register, stores the Return Value of a function if the return value doesn't exceed Register Size.

BX -> Base Register, Contains a Pointer to Data.

CX -> Counter Register, used in loops as a counter to keep track of shifts.

DX -> Data Register, used for I/O and Arithmetics.

// AX BX CX DX are divided into 2 8bit registers, AH/ AL ..etc

SP | BP -> Pointer Registers, used to store stack addresses; Stack Pointer pointing to the Top of the Stack | Base Pointer pointing to the Base of the Stack Frame respectively.

SI | DI -> Index Registers, used to point to data as Source Index|Destination Index respectively, to load from/write to memory during Stream Operations.

GPRs are extended to 32bit and 64bit registers.

 

Segment Registers are 6 in total, their purpose is to store the Segments' Starting Addresses of the Binary being executed. though their use is exetended nowadays, and not really used solely for this purpose.

CS -> Contains Starting Address of the Code Segment that contains all instructions wo be executed,

DS ->  Contains Starting Address of the Data Segment.

SS ->  Contains Starting Address for Stack Segment which contains return addresses of subroutinesand procedures.

EX -> for Extra Segment Registers (FS, ES, GS) which are Fillled with data from the Operating System or Exception Handlers -> FS/ Thread Handling like TEB/PEP.

 

         

Data Types:

X86: Intuitively an instruction like this [mov   eax, 0x666] will store the value as of 32bits dword size _extended with 0s_,  or we can use explicit override prefixes and different sized registers.

X64: yes.. 64bits "qword".

 

                  

Instruction Set: Memory Manipulation | Arithmetic Operations:

 

Data Movement is classified into five general methods:

 -> Immediate to Register
 -> Immediate to Memory
 -> Memory to Register / Register to Memory
 -> Memory to Memory
 -> Register to Register

 

lets look at these Data Movement Instructions


01:  mov  dword ptr [eax],  1    
     ;  set memory at address EAX to 1, extended with 0s to fill the 32bits

02:  mov  ecx,  [eax]          
     ;  set ECX to the content at memory location pointed to by EAX

03:  mov  [eax],  ebx           
     ;  set memory at address EAX to EBX
04:  mov  [esi+34h],  eax       
     ;  set memory at location [esi+34h] to EAX
05:  mov  eax,  [ebx+esi*4]     
     ;  set EAX to memory at location [ebx+esi*4]

 

 

in C words, the equivalence of ASM Square Brackets [], is when you define a Pointer e.g:

mov     [esi+34h], eax     
 
; using the square brackets in [esi+34h] tells you that this is a pointer []
; to the memory address calculated by the value of _esi+34h_
; now go there and set the value/data stored in this memory address to _eax_ .  

 

 

lets look at this example in Pseudo C:


int  var = 5;    
int  *ptr  =  &var ;
    // using  * specifies that this is a pointer, it job is to point to a memory location, if not initialized its called NULL pointer.
    // setting *ptr to &var is setting the pointer's to point the memory location of var using & operator
 
ptr == 0x7ffcae30c71462    // var's location in memory
*ptr == 5                  // actual value of c

 

 

now translating the ASM instructions to Pseudo C:


/* using * unary opertator in C is to assign the value of the memory location pointed to by the poiter to some value */

01:  *(eax)  =  1;       
     ; setting content at location *(eax) to 1
02:  ecx  =  *(eax);
03:  *(eax)  =  ebx;
04:  *(esi+34h)  =  eax;
05:  eax  =  *(ebx+esi*4);  

                  

 

 

[+]Memory Access: [Base + offset]:

 

This form is commonly used to access structure members or data buffers, where the offset is either immediate or a register.

  ; KDPC structure with ECX as the base register,  +0x0XXX as offset

kd>  dt   nt!_KDPC

+0x000              Type                  :UChar             ; 1byte
+0x001              Importance            :UChar             ; 1byte
+0x002              Number                :Uint2B            ; 2bytes
+0x004              DpcListEntry          :_LIST_ENTRY
+0x00c              DeferredRoutine       :void
+0x010              DeferredContext       :Ptr32 Void
+0x014              SystemArgument1       :Ptr32 Void
+0x018              SystemArgument2       :Ptr32 Void
+0x01c              DpcData               :Ptr32 Void

 

; Assembly
 
01: 8B 45 0C         mov     eax,   [ebp+0Ch]         
        ; reading memory at (ebp+0C), setting EAX to this value
02: 83 61 1C 00      and     dword ptr [ecx+1Ch],   0 
        ; writing 0 to memory at address (ecx+1C)-> DpcData
03: 89 41 0C         mov    [ecx+0Ch],   eax          
        ; writing EAX to memory at address (ecx+0c) DeferredRoutine
04: 8B 45 10         mov    eax,   [ebp+10h]          
        ; reading memory at (ebp+10h), storing it into EAX
05: C7 01 13 01 00+  mov    dword ptr [ecx],   113h   
        ; writing dword 0x113 into memory at (ecx) -> base register
06: 89 41 10         mov     [ecx+10h],   eax         
             ; DeferredContext = EAX

  // Pseudo C

KDPC *p = ...;              // initializing the structure
p->DpcData = NULL;
p->DeferredRoutine = ... ;
*(int *)p = 0x113;         // mov dword ptr [ecx], 113h 
                           //overriding the least significant 32bit of the structure
p->DeferredContext = ... ; 

 

 

now we need to give some attention to instruction 05, the value to be written is meant to be stored in 32bits specified by an override prefix dword, this will automatically override the first three variables of the structure "type", "importance", "number" which are 1, 1, 2bytes in size.

05:   C7 01 13 01 00+     mov   dword ptr [ecx],   113h
      ;      00000000 00000000    00000001    00010011
      ;          Number          Importance     Type

05A:  C6 01 13            mov   byte ptr [ecx],    13h
05B:  C6 41 01 01         mov   byte ptr [ecx+1],   1
05C:  66 C7 41 02 00+     mov   word ptr [ecx+2],   0

; this managed to utilized 3 instructions in one instruction as CISC would allow variable length instructions.
; also note that we accessed different granular levels of memory, byte and double-word.

                  

 

 

[+]Memory Access: [Base + Index * scale]:

 

This form is used to access Array-Type Objects, with base register indicating the start address of the array, an index to count over and a scale indicating the number of bytes/size of the array's elements.


; edi is a structure with an array starting with offset +4, and some variable at offset +0

01:                    loop_start:
02:  8B 47 04            mov  eax, [edi+4]    
            ; reading (edi+4) from memory and saving it to EAX
03:  8B 04 98            mov  eax, [eax+ebx*4]
            ; (edi+4) is used as the base address of the array structure, with scale of 4bytes
04:  85 C0               test eax, eax 
            ; ANDing eax with itself to check if its 0; if array[i] != 0
.
.
.
05:  74 14               jz   short loc_7F627F
06:                    loc_7F627F
07:  43                  inc  ebx            
                       ; incrementing the index
08:  3B 1F               cmp  ebx, [edi]     
                       ; comparing i with value at edi+0 offset
09:  7C DD               jl short loop_start

 


// pseudo C
struct {
   dword size;
   dword array[...];
}
for (i = ..;  i < size; i ++ ){

   if(array[i] !=0 )
   {....}

}

                  

 

 

[+]Memory Access: Copying String/Memory between two memory locations:

 

-> MOVSB, MOVSW, MOVSD: moves 1, 2, 4bytes respectively from memory to memory, MOVS instruction implicitly uses EDI, ESI as Destination Index, Source Index, it also result in changing the DF "Direction Flag" so if  "DF=1", both ESI,EDI will be incremented, if "DF=0" both will be decremented, the update value is equal to the size specified by MOVS, so it's either (+-) 1, 2, 4bytes.

 


01: BE 28 B5 41 00     mov  esi,   offset _RamdiskBootDiskGuid
           ; ESI Pointer to RamdiskBootDiskGuid
02: 8D BD 40 FF FF+    lea  edi,   [ebp-0C0h]
           ; EDI is an address somewhere on the stack
03: A5                 movsd
04: A5                 movsd
05: A5                 movsd
06: A5                 movsd
          ; moving 4bytes from ESI to EDI and increment each by 4
          ; ESI, EDI are updated with 4bytes according to the value at DF

 


/* pseudo C */
/* some structure of 16-byte in size */

GUID RamdiskBootDiskGuid = ..... ;       // base_GUID+offset_Ramdisk...
GUID foo;                                // base_GUID+offset_foo

memcopy(&foo, &RamdiskBootDiskGuid, sizeof(GUID));

 

 

In some cases MOVS is accompanied with REP prefix to repeat the instruction with ECX as counter.


01: BE 28 B5 41 00     mov   esi,   offset _RamdiskBootDiskGuid         
02: 8D BD 40 FF FF+    lea   edi,   [ebp-0C0h]
03: 6A 08              push  4
04: 59                 pop   ecx            
                  ; setting counter to 4; ecx = 4

03: F3 A5              rep   movsd

; a combination of movsd, movsb, movsw can be used to copy number of bytes that are ot granular ..etc

 

 

Now the perfect spot to talk about mov, lea with instructions 01- 02; LEA have this format of "lea  destination, source", it doesn't access memory, it's just used to calculate a memory address and puts it in the destination, this is not execlusively for referring to memory addresses, LEA comes in hand when calculating values without accessing memory, resulting in less/fewer instructions.

there is a huge difference between :

  mov   eax, [ebx+4]   and lea   eax, [ebx+4].

[ Practical Malware Analysis ]
                                
 

 

[+]Memory Access: Scan String in Memory:

 

-> SCASB, SCASW, SCASD: scans 1, 2, 4bytes in memory againt AL/AX/EAX registers respectively, data/string in memory starts at address EDI, which is automatically incremented/decremented depending on the DF bit.

; implementation of C/C++ strlen(), where the reg to compare the string with is set to Zero by XORing it, and each B/W/D as the reg size, is scanned over each B/W/D of the memory location, if a reg == mem[x] this means we have reached the (NULL-byte -> x) in the mem/string.
; strlen(edi)

01:  30 C0     xor    al,    al    
             ; NULLing the al reg ; scanning 1 byte at a time
02:  89 FB     mov    ebx,   edi   
             ; saving the original pointer
03:  F2 AE     repne  scasb       
  ; repeat scanning EDI bytes against AL
  ; if AL is not equal to the current EDI byte being scanned

  ; EDI is further updated with respect to the DF flag
  ; eventually EDI will be the null location

04:  29 DF     sub    edi,   ebx       
  ; subtract the null location EDI from the original pointer in EBX this is an implementation of strlen() in C 

                     

 

 

[+]Memory Access: Compare String to String in Memory:

 

-> CMPSB, CMPSW, CMPSD: compares two strings both in memory, sepcified by ESI:EDI as Source:Destination or in other word the two strings as operands.

; compare memory to memory

01:  lea  esi,   [var1]
02:  lea  edi,   [var2]

03:  cmpsd

; with update to both locations with DF bit.

                     

 

 

[+]Memory Access: Store String in Memory:

 

-> STOSB, STOSW, STOSD: stores 1, 2, 4bytes from AL/AX/EAX respectively to the destination address EDI, also EDI is updated accordingly to the value of DF flag.
It is commonly used to initialize a buffer to a constant value.

; implementation of memset()
; memset(edi, 0, 36)


01:   33 C0     xor    eax,   eax  
             ; NULLing eax; storing dword
02:   6A 09     push   9
03:   59        pop    ecx   
04:   8B FE     mov    edi,   esi 
             ; setting the destination address
05:   F3 AB     rep    stosd

   ; with 9 repeats we store 4byte EAX "36bytes" into EDI with the update of EDI based on the DF flag
   ; with +/-4bytes update for each repeat, we end up creating a structure with (edi) as base.

                     

 

 

[+]Memory Access: Load String from Memory:

 

-> LODSB, LODSW, LODSD: from the same family, used for loading data/string from memory at source address ESI and saving it in AL/AX/EAX.

; Memory to Register

01:  xor    rax,  rax
02:  lea    rsi,  [var1]
03:  lodsq  

                     

 

As you can clearly see, this seems to never end, and yes there is a plethora of instructions for memory manipulation and all of them can be used to implement the functionality of C/C++ memory manipulation functions. C has a very thin line seperating it from machine code, considering assembly as an artificial representation to just help us with having a kind of human-readable format of machine code, and assembly has many flavors each is built for a different system. Because of this fine line, C is considered very dangerous and powerful when dealing with memory, off course this is depending on where you are :), this is why over the time we got high-level languages to make a good seperation between the developer and the memory/machine so it can make it harder to blindly destroy stuff -although we end up doing it anyways- but it's considered pretty much safer, eaiser and scalable.

                     

 

 

Instruction Set: Stack Operations | Function Invocation:

 

yep, this is where most crimes begins.. suicides too, its the playground of functions, data, and instructions. Again this is very important to grasp, if you end up familiar with it, -and by familiar i mean by looking at it you no longer want to jump right of the window- inner workings of programs well get crystal clear. Now lets begin with the playground; The Stack:

 

In computer science, a call stack is a data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to just The Stack. The Stack has 3 primary tasks: Passing Fucntion Parameters, Local-Data Storage, Storing Return Addresses.

 

 

[+]Stack Layout:

 

Beacause The Stack is a LIFO structure it has two instructions to literally PUSH, POP data, these two instructions are PUSH, POP :). The Stack or lets say that memory region is being pointed to by the stack pointer ESP, ESP is very dynamic as it points to the top of the stack, so when we PUSH new data the stack gets decremented -recall the stack grows to memory Zero?- and when we POP data off the stack, ESP gets incremented. ESP is updated by +- either by 4bytes, or by 1, 2bytes with a prefix override.

; ESP = 0xb20000
; all data are loaded to size 32bits registers

01:  B8 AA AA AA AA   mov   eax, 0xAAAAAAAAh
02:  BB BB BB BB BB   mov   ebx, 0xBBBBBBBBh
03:  B9 CC CC CC CC   mov   ecx, 0xCCCCCCCCh
04:  BA DD DD DD DD   mov   edx, 0xDDDDDDDDh
05:  50               push  eax
; ESP will be decremented to 0xb20000-4 = 0xb1fffc
; where the value 0xAAAAAAAA will reside on top of the stack

06:  53               push  ebx
; ESP will be dcremented to 0xb1fffc - 4 = 0xb1fff8
; ESP will hold the value 0xBBBBBBBB

07:  5E               pop   esi
; ESI = *(ESP) = 0xBBBBBBBB
; ESP +4 = 0xb1fffc

08:  5F               pop   edi
; EDI = *(ESP) = 0xAAAAAAAA
; ESP +4 = 0xb20000

 

 

If you like visuals;

[ Practical Reverse Engineering ]

                     

 

 

[+]Stack Frames. Function Calls:

 

[What happens in Vegas, stays in Vegas]

Since we are still mentally sound, this topic should be a soft blow. Stack Frames are LIFO data structures used to contain subroutine state information, state information are local variables, return addresses and the caller's base address.
Local Variables are local to the function being called, its when you define a function and define variables inside the function aka local scope. Return Address is simply where to go when the function's execution is finished, this return address is handed to the great EIP register, EIP can flow in sequential order, and can suddenly be handed a different location when a sudden subroutine/function-call pops up. With each subroutine being called, a dedicated stack frame will be initialized holding its parameters and local variables. If you're lucky this could be it, but most likely we are not lucky and we will find ourselves inside an inception, where you find nested functions and nested stack frames. Interestingly enough each subroutine/stack-frame will contain its caller's base address, to not lose mommy. 
 

 

[ the most famous illustration in the RE land ]

 

 

Lets dissect the Inception by implementing this function:

// C
// Function Definition

int addme(a, b)
{
    .
    .                     // some local variables / functionality.
    .   
    return a+b
}


// FunctionCall / Invocation
addme(x, y);

 

 

 

First:

[Where we at?]

The program's execution flow starts we a function is called, all the program's functions' definitions with their local variables are evaluated in runtime, this is very logical and makes sense, and its called "The Call Stack". You start by evaluating the function's parameters that are passed with the function call -if it has any- then you jump to the funcion's location/definition after saving the return address -the location at where the function was called- , you evaluate the function's local variables -if there is any- , then you execute the function, evaluate its return value and save it -if there is any- then you go back to resume execution at the return address you saved. Off course its not you who do all these stuff, but we are keeping a close eye on it.

; Function Invoction
01: 004129F3 50          push  eax
02: 004129F8 51          push  ecx                  
03: 004129F9 E8 F1 FF FF call  addme 
                   ; push EIP on the stack, hand it the addme() location.                                           
04: 004129FE 83 C4 08    add   esp, 8

; Function Definition/Location                   
                   addme:
15: 004113A0 55          push  ebp                         
16: 002113A1 8B EC       mov   ebp, esp
                   ; Function Prologue                   
17: ...            
         ; reserving some space for the function's variables
         ; part of the reason variables need to be declared in C is to aid the construction of this section of code.
18: 004113BE 0F BF 45 08 mov   eax, word ptr [ebp+8]
19: 004113C2 0F BF 4D 0C mov   ecx, word ptr [ebp+0Ch]
20: 004113C6 03 C1       add   eax, ecx
21: .....          
         ; function execution
22: 004113CB 8B E5       mov   esp, ebp                  
23: 004113CD 5D          pop   ebp                         
24: 004113CE C3          retn            
             

 

 

 

Second:

[The Function's Prologue]

Let's look at this from the eyes of the stack, with EBP:ESP as our main focus, at line 03 there is a sudden function-call, we jump to the location of addme() and initialize a stack-frame for this subroutine -it's a subroutine because its definitely inside the main() function-, first we save EIP's next instruction's location aka return address, and gives it the new location in which lies the subroutine addme() to execute. This return address is saved on the stack by PUSHing it, this is done by the call instruction in line 03. Next we set up a base frame for our subroutine which is basically the address from which we will reference our parameters and local variables in other words our stack frame's Home Page. We have two candidates for this task EBP:ESP, as they are the main registers for stack manipulation, the winner is EBP, since ESP is very dynamic and changes with each POP/PUSH, EBP will be used as this reference/home-page/base-frame. We set EBP to the top of the stack -the value of ESP- so we can PUSH/POP data and have a pivot to reference from, this is done with instruction at line 16. Clearly EBP is used as any stack-frame's base-frame/home-page , so this means that the current EBP was our caller's pivot, so we need to save/perserve it before we use EBP as the base frame for our new stack frame, this is done by PUSHing EBP to the stack, before setting it as our base frame, this is done with instruction at line 15.

->Obviously The Stack is designed to effeciently store temporary data at runtime<-

 

 

 

Third:

The inner workings of the subroutine, initializing local variables -if there any- executing the function ..etc. The local variables are referenced relative to EBP _the base frame_ with an offset of a size dictated by an override prefix, in lines 18-19. Now we have the EBP as our base frame that is PUSHed right after the return address as the start of our stack frame construction, so EBP with offset 0 is the base-frame, any local variable that is PUSHed after will be referenced with a -/minus offset _recall, the stack grows to Zero_, that's one thing, the function's parameters that are passed with the function call is another, these are pushed before the call instruction at line 03 in lines 01-02, this is a logical step in the _Call Stack_; that is to evaluate the parameters passed to a function before actually jumping to the fucntion execution. Hence you'll find that a subroutine's parameters are referenced with a +/plus offset relative to the EBP. 

-> The subroutine's returned value -if there any- is automatically saved in EAX<-

 

 

 

Fourth:

[The Function's Epilogue]

Now we are done executing the subroutine, so we move everyone back to his place, with instruction at line 22 we set ESP to be the value of our base frame EBP, this automatically kicks out whatever local variables PUSHed in the stack, so we have a clean slate. Then at line 23, we load the SAVED/PUSHed value of our caller's base frame to EBP. This means we now are awake from an inception, yet to continue living -POPing/PUSHing variables local- in our caller's stack frame, or maybe initiate a new one.

-> Now we actually know why the last Figure 4-8 is the most famous illustration in RE<-

                     

 

 

[+]Calling Conventions | Function Invocation:

 

The way a function passes its parameters and saves its returned value is dictated by "Calling Conventions". A Calling Convention is a set of rules dictating how function calls work at the machine level. It is defined by the Application Binary Interface 'ABI' for a particular system. That explains instruction at line 04 -the return address after the function call- , as it cleans the stack, which means this function call was dictated by CDECL Calling Convention.

[ Practical Reverse Engineering]

 

Its always good to keep references to hold your back. This is the instruction reference for x86 and amd64 with all instructions to lookup. This is the "Call Stack" logic, the high level implementation of the stack frame inception.

-> Wake Up, Neo. <-

                     

 

X64

 

X64 is an extension of X86, so it has most of thearchitecture properties with minor differences.

 

[+]Registers, Data Types and Arithmetic Operations:

 

This time we have 18 64bit GPR with R prefix, RBP can still be used as a the base frame pointer and reference local variables relative to it, yet X64 treats RBP as just another GPR and reference local variables relative to ESP.

 

-> X64 supports the concept of RIP-relative addressing.

-> Most arithmetic operations are automatically updated to 64bit even if the operand is 32bit register, unless an override prefix is specified.

 

 

Registers:

X64 has brought 8 additional Registers R8-15:

R8-11 -> Considered Volatile and data stored in it  will be lost once another function is called.

R12-15 -> Must be saved before another function is called.

 

 

         

[+]Function Invocation:

 

Most calling conventions are passes parameters through registers:

-> Windows x64: has one conventions of passing the first 4 parameters thru RCX, RDX, R8, R9 and the remaining parameters are passed through the stack from right ot left.

-> Linux x64: the first 6 parameters are passed through registers RDI, RSI, RDX, RCX, R8, R9.

 

That almost concludes THIS discussion about X86 - X64, i cannot promise I won't revisit this again, it has to happen, as the more easy to read/understand and spot Code Constructs, the faster we can spot weaknesses and understand malwares' functionality.

 

 

         

[+]There is more to it than meets the eye
 

Each Process running in memory has its illusion of having its own address space, well its not really an illusion, its more of Hardware-OS Magic. X86 supports the concept of privilege separation through an abstraction called Ring Level. In User-Mode aka Ring3, apllications start user-mode processes which comes with its own private virtual address space and handle table. In Kernel-Mode aka Ring0 applications share virtual address space.

 

         

[+]Virtual Address Translation:

 

Memory addresses are divided into Physical Addresses and Virtual Addresess. When Paging is enabled, instruction uses the Virtual Addresses for execution, while Physical Addresses are the actual memory locations used by the the processor when accessing memory. This is done by MMU _Memory Management Unit_ which translates every virtual address into physical address for the CPU before accessing it. CR0-CR4 are Control Registers for Memory Paging and Hardware Virtualization, DR0-DR7 are Control Registers used for setting Memory Breakpoints, while only DR0-DR3 are allowed to be set for memory breakpoints, the rest is reserved for status.

 

         

[+]Interrupts and Exceptions:

 

Interrputs: Hardware Interrupts caused by Hardware Devices, they are asynchronous by nature. An Interrupt can be thought of as being associated with a number that is an index to an array of function pointers, so when an interrupt is received, the CPU executes the function at the index associated with the interrupt, then continues execution to wherever it was before te interrupt. Software Interrupts may be intentionally caused by executing a special instruction which, by design, invokes an interrupt when executed. Such instructions function similarly to subroutine calls and are used for a variety of purposes, such as requesting operating system services and interacting with device drivers.

 

Exceptions: Caused by Instructions and are divided into two categories: "Faults" and "Traps". A Fault is when a processor exeutes an instruction with some exception that is correctable i.e (Page Fault), the processor will save the current execution state and fix the Page Fault then re-executes the instruction. A Trap is issued when a program needs servicing from the OS by excuting a special kind of instructions, the processor executes the system call handler and resumes execution right after is paused.

 

-> Debuggers use Software / Hardware breakpoints. INT3 is a Software interrupt used as input signal for CPUs by debuggers to implement a breakpoint. The opcode for INT3 is 0xCC which is replaced with one byte of an instruction -which you want to set a breakpoint at- that triggers INT3 interrupt designed for debuggers to generate an exception and the OS will stop the program and transfer control to the debugger -> this is basically doing self-modifying code as the code changes while it runs. DR0-DR3 are registers in the CPU when set will trigger a Hardware Breakpoints as addresses the CPU will remember to pause at, with DR7 to store control information. This prevents change in Code. While Traps are used in debuggers to do a Step-In instruction by setting TF Trap Flag for the processor to execute only one instruction at a time.

 

 

 

Now if this turned out to be a drag, it's okay, most of these lower-level concepts are easy to forget unless reviewed and practiced, Later we will discuss the Assembly/ C Code Constructs which will lay out a good foundation for reverse engineering full Programs .