/RE-Essentials/Windows: Dissecting a PE Binary: PE File Format


Operating systems play a key role in reversing. That’s because programs are
tightly integrated with operating systems, and plenty of information can be
gathered by probing this interface. Moreover, the eventual bottom line of
every program is in its communication with the outside world (the program
receives user input and outputs data on the screen, writes to a file, and so on),
which means that identifying and understanding the bridging points between
application programs and the operating system is critical.  

  [ Secrets of Reverse Engineering ]

 

 

Understanding Core Concepts of Operating Systems and the Format and work of Its applications is a needed Knowledge to ACTUALLY reverse engineer either User Land applications, Kernel Modules/ Drivers or even Firmware.

This Episode will go through the PE File Format, dissceting its headers and highlighting important entries that reveals important information and saves a lot of time finding anomalies in malicious binaries, and detecting malformed ones.

 

 

Friends:

-> CFF-explorer, PEView, hxd

 

This Episode's Flow:

[+] DOS Header

[+] NT Header | PE Header

[+] File Header

[+] Optional Header

[+] Section Headers

 

        

INTRO

 

 

Basic Pseudo Structure of what we need to know about a PE-File.


_DOS_HEADER{
        e_magic;
        e_lfanew;
};
_NT_HEADER{
        Signature;
        _FILE_HEADER{
                Machine;
                TimeDateStamp;
                NumberOfSections;
                Characteristics;
                SizeOfOptionalHeader;
        };
        _OPTIONAL_HEADER{
                Magic;
                AddressOfEntryPoint;
                SizeOFImage;
                SectionAlignment;
                FileAlignment;
                ImageBase;
                DLLCharacteristics;
                _DATA_ENTRY DataDirectory[16]{
                        VirtualAddress;
                        Size;
                };
        };
};
_SECTION_HEADER1{
        Name;
        Misc.VirtualSize;
        VirtualAddress;
        SizeOfRawData;
        PointerToRawData;
        Characteristics;
};
.
.
.
.
_SECTION_HEADERn{
        Name;
        Misc.VirualSize;
        VirtualAddress;
        SizeOfRawData;
        PointerToRawData;
        Characteristics;
};


 

 

 

DOS Header

 

 

MS_DOS STUB that prints `This program cannot be run in DOS mode`, though this STUB can be changed.

// from winnt.h

typedef struct _IMAGE_DOS_HEADER {
    WORD  e_magic;      /* 00: MZ Header signature */
    WORD  e_cblp;       /* 02: Bytes on last page of file */
    WORD  e_cp;         /* 04: Pages in file */
    WORD  e_crlc;       /* 06: Relocations */
    WORD  e_cparhdr;    /* 08: Size of header in paragraphs */
    WORD  e_minalloc;   /* 0a: Minimum extra paragraphs needed */
    WORD  e_maxalloc;   /* 0c: Maximum extra paragraphs needed */
    WORD  e_ss;         /* 0e: Initial (relative) SS value */
    WORD  e_sp;         /* 10: Initial SP value */
    WORD  e_csum;       /* 12: Checksum */
    WORD  e_ip;         /* 14: Initial IP value */
    WORD  e_cs;         /* 16: Initial (relative) CS value */
    WORD  e_lfarlc;     /* 18: File address of relocation table */
    WORD  e_ovno;       /* 1a: Overlay number */
    WORD  e_res[4];     /* 1c: Reserved words */
    WORD  e_oemid;      /* 24: OEM identifier (for e_oeminfo) */
    WORD  e_oeminfo;    /* 26: OEM information; e_oemid specific */
    WORD  e_res2[10];   /* 28: Reserved words */
    DWORD e_lfanew;     /* 3c: Offset to extended header */
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

 

 

 

 

// IMPORTANT ENTRIES:

 

-> e_magic;  ASCII ``MZ``, if the OS Loader don't find this Signature. it will drop the executable.

#define IMAGE_DOS_SIGNATURE    0x5A4D     /* MZ   */

 

 

-> e_lfanew; last member of the DOS-Header Structure, contains an offset to the next header/ structure.

[example for parsing _IMAGE_DOS_HEADER.e-lfanew to find PE-Header start offset - notepad.exe]

 

 

 

NT Header | PE Header

 

 

This Structure in its intirety is considered the PE-Header.

typedef struct _IMAGE_NT_HEADERS64 {
  DWORD Signature;
  IMAGE_FILE_HEADER FileHeader{};
  IMAGE_OPTIONAL_HEADER64 OptionalHeader{};
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

 

Almost the wrapper of what the windows loader needs to process the binary, starting from a MS_DOS STUB to a standard COFF File format and the Windows-Specific PE file format. This Structure has three entries, with the last two entries as structures embedded in this structure.

 

 

 

// IMPORTANT ENTRIES:

 

-> Signature;   ASCII String for `PE`

#define IMAGE_NT_SIGNATURE     0x00004550 /* PE00 */

 

 

-> _FILE_HEADER{ ... }; The standard COFF File header.

PE is created over the COFF file format _as an extension to it_, and in some resources you'll find this structure referenced as [ File Header | COFF File header ] .

 

-> _OPTIONAL_HEADER{ ... }; Structure that's needed by the windows loader to load, setup and execute the binary image i.e. executbles.

 

 

 

[+] COFF File-Header:

 

This is a generic COFF File Format, it can be extended by other formats depending on their specification, this extension is set up in another header, for windows it's called _OPTIONAL_HEADER.

 

typedef struct _IMAGE_FILE_HEADER {
  WORD  Machine;
  WORD  NumberOfSections;
  DWORD TimeDateStamp;
  DWORD PointerToSymbolTable;
  DWORD NumberOfSymbols;
  WORD  SizeOfOptionalHeader;
  WORD  Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

 

 

 

// IMPORTANT ENTRIES:

 

-> Machine; CPU architecture specification,  [ 32bit architecture -> 0x14C | 64bit architecture -> 0x8664 ]

 

#define	IMAGE_FILE_MACHINE_I386		0x014c

#define	IMAGE_FILE_MACHINE_AMD64	0x8664

 

 

-> NumberOfSections; Contains number of Section Headers ahead.

 

-> TimeDateStamp; Time of compiling the binary, set at Link_Time, this option can be changed to mislead analysts as this is a very powerful technique to track malicious binaries, yet it's not the only TimeDateStamp entry in the binary, there is a debug TimeDateStamp entry in _Debug_Directory{ }; structure as we will see later.

 

-> Characteristics; Attributes of the object or image file to indicate its type.

 

#define IMAGE_FILE_RELOCS_STRIPPED	0x0001 /* No relocation info */
// relocation info is by default stripped for executables

#define IMAGE_FILE_EXECUTABLE_IMAGE	0x0002
// means it's an executable image

#define IMAGE_FILE_SYSTEM		0x1000
// for system files   .sys

#define IMAGE_FILE_DLL			0x2000
// for DLLs

#define IMAGE_FILE_LARGE_ADDRESS_AWARE	0x0020
// if the image can be handled on addresses larger than 2GBs, for 64bit this is set, and for 32bit is set only if the boot option "allocate 3GB of memory to userspace" is set, rather than the 2/2 split between kernel and userspace. 

 

[example parsing _IMAGE_FILE_HEADER.Characteristics - notepad.exe]

 

 

 

 

[+] Optional Headers:

 

 

Extension Specifications for Windows Images/ PE file format, this is REQUIRED for executables but OPTIONAL in the sense that it's not needed for object files though they can have it, but it's not more than a bloat, as object files don't really follow the PE format and they can be thought of as a sort of an archive.

 

typedef struct _IMAGE_OPTIONAL_HEADER64 {
  WORD  Magic; /* 0x20b -> PE32+ | 0x10b -> PE32 */
  BYTE MajorLinkerVersion;
  BYTE MinorLinkerVersion;
  DWORD SizeOfCode;
  DWORD SizeOfInitializedData;
  DWORD SizeOfUninitializedData;
  DWORD AddressOfEntryPoint;
  DWORD BaseOfCode;
  ULONGLONG ImageBase;
  DWORD SectionAlignment;
  DWORD FileAlignment;
  WORD MajorOperatingSystemVersion;
  WORD MinorOperatingSystemVersion;
  WORD MajorImageVersion;
  WORD MinorImageVersion;
  WORD MajorSubsystemVersion;
  WORD MinorSubsystemVersion;
  DWORD Win32VersionValue;
  DWORD SizeOfImage;
  DWORD SizeOfHeaders;
  DWORD CheckSum;
  WORD Subsystem;
  WORD DllCharacteristics;
  ULONGLONG SizeOfStackReserve;
  ULONGLONG SizeOfStackCommit;
  ULONGLONG SizeOfHeapReserve;
  ULONGLONG SizeOfHeapCommit;
  DWORD LoaderFlags;
  DWORD NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

 

 

 

 

// IMPORTANT ENTRIES:

 

-> Magic; True determinant if the application is [32bit | 64bit], While the  [ Machine; ] field in the NT-Header signifies the CPU/ Architecture to run on, this field is what actually matters for the OS-Loader to either parse the _OPTIONAL_HEADER for [ 32bit specifications or 64bit specifications ].

 

-> AddressOfEntryPoint; Contains RVA -> Relative Virtual Address to the memory image, the place where the loader should head to after loading and setting up the image, where execution begins, though not necessarily the start of the .text or the main(); , but in general this is where the image starts executing code. the place where the debugger stops after loading the binary.

 

-> ImageBase; Contains the preferred address to where this image should be mapped in memory, though for .exe it's always the virtual address of 0x00400000 _with no ASLR enabled_, but for DLLs developers are encouraged to `rebase` the file, i.e. choose a non-default ImageBase _ which is 0x10000000 default address for DLLs_ to avoid collision and save the loader the burden of relocating the DLL at runtime, but that was the pre-ASLR era, for now this ImageBase doesn't really matter as it's randomly changed by The Kernel Memory Manager.

 [ there is a disscussion for REBASING, RELOCATING, and ASLR later... ]
 

 

-> SectionAlignment; Windows specifies a SectionAlignment for binaries that must be validated by the loader before mapping what's called `Sections` into memory, and basically this value (in bytes) is of a page size of the architecture's memory -> e.g [ 0x1000 == 4096bytes ] a page or it's multiples, so sections should be of this size to be mapped in memory properly.

 

-> FileAlignment; Contains Value (in bytes) of Alignment Factor to align raw data of the File Image on Disk, must be power of 2, default value is [ 0x200 -> 512bytes ]: HardDisk Page Size | [ 0x1000 -> 4096bytes ]: newer HDDs. Sections on File must be padded out if it takes less than 512bytes page size.

 

**SectionAlignment must be equal to or greater than FileAlignment.

 

 

 

-> DLLCharacteristics;

This field is mostly for security, defining support for ASLR, DEP, Integrity checks ...etc.

 

A-> ASLR: Adress Space Layout Randomization.

#define IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE          0x0040

 

explicit support for ASLR, that allows DLLs to be dynamically relocated anywhere in memory, with a .reloc section specifying a list of all places/ functions' offsets that needs fixing.
=> set as a linker option: /DYNAMICBASE

 

yet for executables another option must be set /FIXED:NO to let the compiler generate a .reloc section as relocation is set to [ OFF ] for executables by default.

 

B-> Integrity Checks.

#define IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY       0x0080

 

Check for a signature attached along the whole headers to check for this image/binary integrity, if not present or mangled the OS-Loader will drop the image.

 

C-> DEP: Data Execution Prevention.

#define IMAGE_DLLCHARACTERISTICS_NX_COMPAT             0x0100

 

Memory flag to set the stack/ heap/ data as non executable, so no section in memory will have [WE | WX -> Writable-Executable] flag at the same time.

=> set by the linker option /NXCOMPAT

 

D-> SAFESEH.

#define IMAGE_DLLCHARACTERISTICS_NO_SEH                0x0400

 

long story short, if set it means that this binary doesn't use a Structured Exception Handler and if any exception occurs, It's an explicit order to just kill the binary.

=> set by the linker option /SAFESEH

 

 

E-> Control Flow Guard.

#define IMAGE_DLLCHARACTERISTICS_GUARD_CF              0x4000 

 

Image that has explicit support for Control Flow Guard, makes checks for indirect-call targets of code's control-flow at runtime.

=> set by linker option /guard:cf

=> disabled by /guard:cf-

 

F-> Terminal Server Aware Applications.

#define IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE 0x8000

 

Mechanism behind using RDP `Remote Desktop Protocol` to control remote desktops and interact with it as if you were in a GUI System.

=> set by linker option /TSAWARE

 

 

[ _IMAGE_OPTIONAL_HEADER.DllCharacteristics of notepad.exe]

 

 

-> _IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];

Basically an array with pointers to structures that are called `Data Directories` with information about imports, exports, relocation, TLS, debug information, signatures ..etc

 

/* Directory Entries, indices into the DataDirectory array */

#define	IMAGE_DIRECTORY_ENTRY_EXPORT		0
#define	IMAGE_DIRECTORY_ENTRY_IMPORT		1
#define	IMAGE_DIRECTORY_ENTRY_RESOURCE		2
#define	IMAGE_DIRECTORY_ENTRY_EXCEPTION		3
#define	IMAGE_DIRECTORY_ENTRY_SECURITY		4
#define	IMAGE_DIRECTORY_ENTRY_BASERELOC		5
#define	IMAGE_DIRECTORY_ENTRY_DEBUG		6
#define	IMAGE_DIRECTORY_ENTRY_COPYRIGHT		7
#define	IMAGE_DIRECTORY_ENTRY_GLOBALPTR		8   /* (MIPS GP) */
#define	IMAGE_DIRECTORY_ENTRY_TLS		9
#define	IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG	10
#define	IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT	11
#define	IMAGE_DIRECTORY_ENTRY_IAT		12  /* Import Address Table */
#define	IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT	13
#define	IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR	14

 

For each structure/ data directory ->

typedef struct _IMAGE_DATA_DIRECTORY {
  DWORD VirtualAddress;
  DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

 

-> VirtualAddress; In fact RVA `Relative Virtual Address` to the data structure.

-> Size; Size of the data structure.

 

 

[ _IMAGE_OPTIONAL_HEADER.DataDirectory[] of notepad.exe]

 

 

Section Headers:

 

 

IMMEDIATLY AFTER the PE | NT-Headers, there is an array of Structures | SECTION HEADERS == [ _IMAGE_FILE_HEADER.NumberOfSections ] each structure/ section header consists of:


typedef struct _IMAGE_SECTION_HEADER {
  BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

 

 

 

// IMPORTANT ENTRIES:

 

-> Name[8]; array of 8bytes UTF-8 characters containing the name of the section, null-padded if the section name is less than 8bytes, if equal to 8bytes there will be no null-termination character.

 

-> Misc.VirtualSize; Size of Section in Memory.

// UNION is a structure used to store multiple interpretation of the same exact data, Misc.VirtualSize == Misc.PhysicalAddress, but all we will witness is Misc.VirtualSize since PhysicalAddress is not really referenced anymore for recent architectures.

 

-> VirtualAddress; RVA -> Offset to the Section relative to the Image's base address.

// Absolute Virtual Address of a section == [ SectionHeader.VirtualAddress + OptionalHeader.ImageBase ]

 

-> SizeOfRawData; Contains Size of section on disk.

 

-> PointerToRawData; Contains offset to the Section in disk relative to the beginning of the file.

 

// for Sections with writable memory protection, there is a noticeable difference between Section's VirtualSize/ RawSize.

 

 

-> Characterstics; Different characteristics of a section including memory permissions, type of data in the section ..etc

 

#define IMAGE_SCN_CNT_CODE			0x00000020
// section contains code

#define IMAGE_SCN_CNT_INITIALIZED_DATA		0x00000040
// section contains initialized global data

#define IMAGE_SCN_CNT_UNINITIALIZED_DATA	0x00000080
// section contains uninitialized global data

#define IMAGE_SCN_MEM_DISCARDABLE		0x02000000
// for relocation, after finishing it's job, this section is thrown away after load time

#define IMAGE_SCN_MEM_NOT_CACHED		0x04000000
#define IMAGE_SCN_MEM_NOT_PAGED			0x08000000
// for .text section, means it cannot be paged out of memory to disk

#define IMAGE_SCN_MEM_SHARED			0x10000000
#define IMAGE_SCN_MEM_EXECUTE			0x20000000
#define IMAGE_SCN_MEM_READ			0x40000000
#define IMAGE_SCN_MEM_WRITE			0x80000000

 

[ _IMAGE_SECTION_HEADER.Characteristics of notepad.exe ]

 

 

[+] Sections:

 

Sections are portions of data/ code with similar memory protections and purpose grouped together. Sections has to be explicitly ordered, Various Sections are:

 

[+] .text:

ALWAYS NON_PAGEABLE: Cannot be paged out of memory to disk, Contains Code.

 

[+] .data:

Contains global data that can be changed, READ/ WRITE protections.

 

[+] .bss:

Contains data that's not initialized, gets merged with .data, takes no space on disk but takes space in memory.

 

[+] .rdata:

READ-ONLY data, i.e. Strings.

 

[+] .idata:

Contains Imports information, usually merged with .data

 

[+] .edata:

Contains exports information, usually merged with .data

 

[+] .pdata:

Contains debugging/ exception processing informations, usually merged with .data

 

[+] .reloc:

Contains relocation information with all constants that needs fixing by the Loader.

 

[+] .rsrc:

Contains resources from Icons with different resolutions, to STUBS and kernel modules that are dropped and later executed _in case of Malware_

 

 

// SECTIONS can have any name, and can be merged by the Linker, though the linker will warn about merging sections of different memory permissions.

          

 

 

For an Understanding of the PE-File Format, this pretty much does the job, in later parts we will dive deeply into Data Directories discussing Imports/ Exports/ Debug Information/ Relocations and much more.