Writing a simple x86 emulator with IDAPython

Often times, when I stumble upon IDAPython scripts, I notice that they are using inefficient / incorrect IDAPython APIs to disassemble or decode instructions (for instance using idc.GetMnem() or idc.GetDisasm()). Therefore, in this blog post, I am going to illustrate how to use IDA’s instruction decoding functions from IDAPython in order to write a very simple x86 emulator. The goal is to demonstrate the correct use of instruction decoding IDAPython APIs. By the end of this post, you should be able to solve similar problems using IDAPython.

Table of Contents

Overview
Disassembling the program
Writing the emulator
Quick intro to instruction decoding
Scoping challenge functions
Emulating instructions

 

Overview

Let’s get started with the following sample program which contains a table of 12 challenge functions that get called in a loop and the result is displayed. Our goal is to write an emulator that can compute the challenge response value statically without using a 3rd party emulator.

#include <stdio.h>
#include <cstdint>
#include <time.h>
#include <stdlib.h>

typedef uint64_t (*challenge_proto_t)(uint32_t, uint32_t);

//-------------------------------------------------------------------------
uint64_t challenge_1(uint32_t c1, uint32_t c2) // 39 operation(s)
{
    uint64_t result;
    __asm
    {
        pushad
        mov eax, [c1]
        mov edx, [c2]
        not eax
        dec edx
        xor edx, eax
        xor edx, eax
        inc eax
        not eax
        sub edx, 0x27c12466
        inc eax
        dec edx
        not edx
        inc eax
        add eax, 0x273804ca
        xor edx, 0xaa5a1584
        sub eax, edx
        not edx
        xor eax, 0xf94f7d8c
        dec edx
        dec eax
        sub eax, edx
        not edx
        dec edx
        sub edx, 0xd7b41b83
        xor eax, 0xa551a9c7
        add eax, eax
        dec eax
        inc eax
        not eax
        add edx, 0xa551b974
        inc edx
        dec edx
        not edx
        xor eax, 0x200d519
        not edx
        not eax
        sub edx, 0xeb15b7ef
        xor eax, 0xb2558b8c
        xor eax, 0xda288d90
        not eax
        not edx
        mov dword ptr[result], eax
        mov dword ptr[result + 4], edx

        popad
    }
    return result;
}

// .
// .
// challenge_2 .. challenge_12
// .
// .

challenge_proto_t challenge_funcs[] = {
    challenge_1,
    challenge_2,
    challenge_3,
    challenge_4,
    challenge_5,
    challenge_6,
    challenge_7,
    challenge_8,
    challenge_9,
    challenge_10,
    challenge_11,
    challenge_12
};

//-------------------------------------------------------------------------
int main(int argc, char *argv[])
{
    if (argc < 4)
    {
        printf("challenge func[0..%d] challenge1-32 challenge2-32 -> result-64\n", _countof(challenge_funcs));
        return -1;
    }

    uint32_t f = atol(argv[1]) % _countof(challenge_funcs);

    uint32_t c1 = atol(argv[2]);
    uint32_t c2 = atol(argv[3]);

    printf("challenge_%d(%d, %d) -> %016I64X\n", f, c1, c2, challenge_funcs[f](c1, c2));

    return 0;
}

In real life, one can find him/herself in a similar situation wanting to treat a function as a blackbox instead of reversing it back into pseudo-code. In such cases, one has many choices:

  • Use Appcall during runtime and query the challenge function directly.
  • Use a decompiler to extract the algorithm in C pseudo-code and recompile it then use it.
  • Extract the challenge function assembly listing, reassemble and use it.
  • Use an emulation framework (the Unicorn engine for example) to emulate the challenge function.

Let’s compile this program and run it with c1=123 and c2=456:

C:\>for /l %a in (0, 1, 11) do @test.exe %a 123 456
challenge_0(123, 456)  -> 8FDCE2E203FCAAF2
challenge_1(123, 456)  -> E0317E1AB061ED8B
challenge_2(123, 456)  -> A0A0E0C2279BE734
challenge_3(123, 456)  -> 5D18D0A79D07D7D8
challenge_4(123, 456)  -> 2583B4EEB62E6042
challenge_5(123, 456)  -> D5261E0275AB9805
challenge_6(123, 456)  -> F2B3282E143F7927
challenge_7(123, 456)  -> 9B9B3CBB0169F4CD
challenge_8(123, 456)  -> EF51086C5D1AF235
challenge_9(123, 456)  -> FC8A97125C0EA232
challenge_10(123, 456) -> EEAE8BEB7996D2E7
challenge_11(123, 456) -> 4F36E6A65AB03929

Disassembling the program

Let’s disassemble the test program and locate the challenge_funcs table and the first challenge function:

;
; The challenge functions as referenced from main()
; (12 functions)
;
.rdata:0041749C 00 10 40 00             challenge_funcs dd offset sub_401000
.rdata:0041749C                                   ; DATA XREF: _main+57↑r
.rdata:004174A0 90 10 40 00             dd offset sub_401090
.rdata:004174A4 30 11 40 00             dd offset sub_401130
.rdata:004174A8 D0 11 40 00             dd offset sub_4011D0
.rdata:004174AC 80 12 40 00             dd offset sub_401280
.rdata:004174B0 30 13 40 00             dd offset sub_401330
.rdata:004174B4 A0 13 40 00             dd offset sub_4013A0
.rdata:004174B8 30 14 40 00             dd offset sub_401430
.rdata:004174BC C0 14 40 00             dd offset sub_4014C0
.rdata:004174C0 30 15 40 00             dd offset sub_401530
.rdata:004174C4 E0 15 40 00             dd offset sub_4015E0
.rdata:004174C8 80 16 40 00             dd offset sub_401680

;
; The first challenge function
;
.text:00401000                         ; int __cdecl sub_401000(int c1, int c2)
.text:00401000                         sub_401000 proc near
.text:00401000 ; CODE XREF: _main+65↓p
.text:00401000 ; DATA XREF: .rdata:challenge_funcs↓o
.text:00401000
.text:00401000                         var_8= dword ptr -8
.text:00401000                         var_4= dword ptr -4
.text:00401000                         c1= dword ptr  8
.text:00401000                         c2= dword ptr  0Ch
.text:00401000
.text:00401000 55                      push    ebp
.text:00401001 8B EC                   mov     ebp, esp
.text:00401003 83 EC 08                sub     esp, 8
.text:00401006 53                      push    ebx
.text:00401007 56                      push    esi
.text:00401008 57                      push    edi
.text:00401009 60                      pusha
.text:0040100A 8B 45 08                mov     eax, [ebp+8]
.text:0040100D 8B 55 0C                mov     edx, [ebp+c2]
.text:00401010 F7 D0                   not     eax
.text:00401012 4A                      dec     edx
.text:00401013 33 D0                   xor     edx, eax
.text:00401015 33 D0                   xor     edx, eax
.text:00401017 40                      inc     eax
.text:00401018 F7 D0                   not     eax
.text:0040101A 81 EA 66 24 C1 27       sub     edx, 27C12466h
.text:00401020 40                      inc     eax
.text:00401021 4A                      dec     edx
.text:00401022 F7 D2                   not     edx
.
.
.
.text:00401076 F7 D2                   not     edx
.text:00401078 89 45 F8                mov     [ebp+var_8], eax
.text:0040107B 89 55 FC                mov     [ebp+var_4], edx
.text:0040107E 61                      popa
.text:0040107F 8B 45 F8                mov     eax, [ebp+var_8]
.text:00401082 8B 55 FC                mov     edx, [ebp+var_4]
.text:00401085 5F                      pop     edi
.text:00401086 5E                      pop     esi
.text:00401087 5B                      pop     ebx
.text:00401088 8B E5                   mov     esp, ebp
.text:0040108A 5D                      pop     ebp
.text:0040108B C3                      retn
.text:0040108B
.text:0040108B                         sub_401000 endp
.text:0040108B

We could easily locate the challenge_funcs table because it is referenced from main(). The first challenge function, like all of the others, have a very distinct format/pattern on which we will base the emulator design.

We can distinctly see a pusha instruction (at 0x401009), followed by two instructions that load the initial values (at 0x40100A and 0x40100D), then a series of operations (between 0x401010 and 0x401076) on those registers and finally we see the results being copied back into local variables (at 0x401078 and 0x40107B) before popa is used to restore all registers.

We will use this code pattern to write a small function that identify the boundaries of the instructions that do the computation. We will then write another function that emulates code within a given range and returns the result.

Writing the emulator

In this section, we will implement two functions:

  • scope_challenge_function(): this function finds the instructions boundaries to be emulated.
  • emulate_challenge_function(): this function emulates instructions within a given range.

Before we get started, let’s define some global variables needed by the script:

import idc, idautils, idaapi

challenge_funcs_tbl = 0x41749C
challenge_funcs_tbl_size = 12

RESULTS = (
    0x8FDCE2E203FCAAF2,
    0xE0317E1AB061ED8B,
    0xA0A0E0C2279BE734,
    0x5D18D0A79D07D7D8,
    0x2583B4EEB62E6042,
    0xD5261E0275AB9805,
    0xF2B3282E143F7927,
    0x9B9B3CBB0169F4CD,
    0xEF51086C5D1AF235,
    0xFC8A97125C0EA232,
    0xEEAE8BEB7996D2E7,
    0x4F36E6A65AB03929)

We deduced the challenge functions table and its size from the disassembly above. We also define the RESULTS variable containing the output of calling each challenge function with c1=123 and c2=456. We will use that table to verify the emulation after we are done.

To enumerate all the challenge functions in the table, we can do something like:

for i in range(0, challenge_funcs_tbl_size):
    func = idc.Dword(challenge_funcs_tbl +  i * 4)
    print(">%x: challenge #%d" % (func, i + 1))

…and the output is:

>401000: challenge #1
>401090: challenge #2
>401130: challenge #3
>4011d0: challenge #4
>401280: challenge #5
>401330: challenge #6
>4013a0: challenge #7
>401430: challenge #8
>4014c0: challenge #9
>401530: challenge #10
>4015e0: challenge #11
>401680: challenge #12

Quick intro to instruction decoding

To decode instructions using IDAPython, use the idautils.DecodeInstruction() function:

# .text:0040101A 81 EA 66 24 C1 27       sub     edx, 27C12466h
inst = idautils.DecodeInstruction(0x40101A)

If decoding fails, then this function returns None. If decoding succeeds, we get an instruction object containing information about the instruction and its operands.

These are the important instruction attributes:

  • inst.itype: this is an integer representing the instruction type. Different opcodes have the same itype and hence opcode != itype.
  • inst.size: this is the size of the decoded instruction.
  • inst.Operands[]: this is a zero based array containing operands information.
  • inst.Op1 .. OpN: these are 1-based aliases into the Operands array.
  • inst.ea: the linear address of the decoded instruction.

You might be wondering what is the relationship between an opcode and its itype? The answer is simple. In IDA, the open database’s processor module is responsible for filling the itype field based on the opcode. In the IDA SDK, you can find a header file called allins.hpp. This header file contains enums for all supported processor modules along with enum members for each supported instruction:

// Excerpt from allins.hpp

// x86/x64 itypes
enum
{
NN_null = 0,            // Unknown Operation
NN_aaa,                 // ASCII Adjust after Addition
NN_aad,                 // ASCII Adjust AX before Division
NN_aam,                 // ASCII Adjust AX after Multiply
NN_aas,                 // ASCII Adjust AL after Subtraction
.
.
.
NN_jz,                  // Jump if Zero (ZF=1)
NN_jmp,                 // Jump
NN_jmpfi,               // Indirect Far Jump
NN_jmpni,               // Indirect Near Jump
NN_jmpshort,            // Jump Short (not used)
NN_lahf,                // Load Flags into AH Register
.
.
.
// Pentium III Pseudo instructions

NN_cmpeqps,             // Packed Single-FP Compare EQ
NN_cmpltps,             // Packed Single-FP Compare LT
NN_cmpleps,             // Packed Single-FP Compare LE
NN_cmpunordps,          // Packed Single-FP Compare UNORD
.
.
.
}

I don’t know why, but the NN_ prefix is used for all the instruction types on the x86/x64 processor.

Here’s a simple example on how to decode and check the instruction type:

# .text:00402085 74 09 jz short loc_402090
inst = idautils.DecodeInstruction(0x402085)
print("YES" if inst.itype == idaapi.NN_jz else "NO")

One can intuitively check what the decoded instruction is by comparing against one of the idaapi.NN_xxxx constants.

As for operands, one can access them via inst.Operands[] or inst.OpN. To get the number of operands used by the decoded instruction, you should not rely on the length of the Operands array because it will always resolve to UA_MAXOP == 8 (see ida.hpp). Instead, iterate over each operand and see if its type is o_void.

An instruction operand is defined using the op_t structure type defined in the ua.hpp header file.

Some of the operand members are:

  • op.flags: operands flags.
  • op.dtype: operand type. One of the dt_xxx consts. One can use this field to tell the size of the operand (1 == dt_byte, 2 == dt_word, etc.).
  • op.type: operand type. One of the o_xxx consts.
  • specflag1.. specflag4: processor specific flags.

These are the supported operand types (o_xxx):

  • o_void: no operand present.
  • o_reg: operand is a  register (al, ax,es,ds…).
  • o_mem: direct memory reference (DATA).
  • o_phrase: memory Reference [Base Reg + Index Reg].
  • o_displ: memory Reg [Base Reg + Index Reg + Displacement].
  • o_imm: immediate value.
  • o_far: immediate far Address (CODE).
  • o_near: immediate near Address (CODE).
  • o_idpspec0 .. o_idpspec5: processor specific flags.

There are additional operand members whose meaning differ based on the operand’s type:

  • op.reg: the register number (o_reg).
  • op.phrase: index register with memory accessing operands (o_phrase).
  • op.value: the immediate value (o_imm) or the outer displacement (o_displ).
  • op.addr: memory address used by the operand (o_mem, o_far, o_displ, o_near).

When the operand type is o_reg or o_phrase, then the op.reg/op.phrase values contain the register’s enum value. Like the NN_xxx terminology, the IDA SDK also provides the register constant names and their values; however this is only true for the x86/x64 processor module. Here’s an excerpt from the header file intel.hpp:

enum RegNo
{
  R_ax = 0,
  R_cx,
  R_dx,
  R_bx,
  R_sp,
  R_bp,
  R_si,
  R_di,
  R_r8,
  R_r9,
  R_r10,
  R_r11,
  R_r12,
  R_r13,
  R_r14,
  R_r15,
.
.
.
}

Unfortunately though, those enums are not exposed to IDAPython, but at least we know enough to define something like the following:

REG_EAX = 0
REG_EDX = 2
REG_EBP = 5
.
.
.
REG_NAMES = { REG_EAX: 'eax', REG_EDX: 'edx', REG_EBP: 'ebp' ...}

Here’s another example on how we can fully ‘disassemble’ an instruction:

# .text:0040106F 35 90 8D 28 DA xor     eax, 0DA288D90h
out = ''
inst = idautils.DecodeInstruction(0x40106F)
out += "XOR "     if inst.itype == idaapi.NN_xor else ""
out += "EAX"      if (inst.Op1.type == idaapi.o_reg and inst.Op1.reg == 0) else ""
out += ", 0x%08X" % inst.Op2.value if (inst.Op2.type == idaapi.o_imm) else ""

print(out)

# Outputs: XOR EAX, 0xDA288D90

That covers the instruction decoding principles. Please refer to header files intel.hpp, allins.hpp, ua.hpp and idp.hpp for more information.

Scoping the challenge functions

Earlier, we figured out how to go over the challenge functions table and retrieve the address of each challenge function. Let’s now write a function that uses the instruction decoder to find the boundaries of the instructions to be emulated.

Please note that I can use IDAPython’s FindBinary() but that defies the purpose of this article. For demonstration purposes, I want to find the code pattern in question using instruction decoding only:

def scope_challenge_function(func_ea):
    f = idaapi.get_func(func_ea)
    if f is None:
        return (False, "No function at address!")
        
    emu_start, emu_end = f.startEA, f.endEA
    
    ea = emu_start

    #    
    # Find the start of the emulation pattern
    #
    stage = 0
    while ea <= emu_end:
        inst = idautils.DecodeInstruction(ea)
        if inst is None:
            return (False, "Could not decode")
            
        # Advance to next instruction
        ea += inst.size
        
        # mov (eax|edx), [ebp+?]
        if (inst.itype == idaapi.NN_mov) and (inst.Operands[0].type == idaapi.o_reg) and \
           (inst.Operands[1].type == idaapi.o_displ) and (inst.Operands[1].phrase == REG_EBP):
            # mov eax, [ebp+8]
            if (stage == 0) and (inst.Operands[0].reg == REG_EAX) and (inst.Operands[1].addr == 8):
                stage = 1
            # mov edx, [ebp+0xC]
            elif (stage == 1) and (inst.Operands[0].reg == REG_EDX) and (inst.Operands[1].addr == 0xC):
                stage = 2
                emu_start = ea
        elif (stage == 2) and (inst.itype == idaapi.NN_popa):
            # Let's decode backwards twice and double check the pattern
            ea2 = idc.PrevHead(ea)
            
            # Disassemble backwards
            for _ in range(0, 2):
                ea2 = idc.PrevHead(ea2)

                inst = idautils.DecodeInstruction(ea2)
                if (inst.itype == idaapi.NN_mov) and (inst.Op1.type == idaapi.o_displ) and \
                   (inst.Op1.reg == 5):
                    if inst.Op2.reg == 2 and stage == 2:
                        stage = 3
                    elif inst.Op2.reg == 0 and stage == 3:
                        stage = 4
                        emu_end = ea2
                        break
                   
            break
            
       
    if stage != 4:
        return (False, "Could not find markers")
            
    return (True, (emu_start, emu_end))

The basic pattern when decoding instructions is to advance the decoding address (the ea variable in this case) by inst.size after each successful decoding. Afterwards, one should check the instruction’s itype, then inspect the operands accordingly.

Note that at stage #2, I start decoding backwards. To go backwards in a proper disassembly listing, one can use the idc.PrevHead() function to retrieve the start address of the previously defined instruction (see line 37). Let’s test this function:

Python>ok, info = scope_challenge_function(0x401000)
Python>if ok: print("start=%08X end=%08X" % (info[0], info[1]))
start=00401010 end=00401078

Emulating instructions

In the previous step, we managed to retrieve a start and end address for the emulation boundaries. Now, let’s write a simple emulation function that only supports a limited set of instructions (NOT, DEC, INC, XOR, SUB and ADD):

def emulate_challenge_function(info, c1, c2, dbg = False):
    emu_start, emu_end = info
    if dbg:
        print("Emulating from %x to %x (%d, %d)" % (emu_start, emu_end, c1, c2))

    # Reset registers    
    regs = { 
      REG_EAX: c1,
      REG_EDX: c2
    }
    
    def get_opr_val(inst, regs):
        if inst.Op2.type == o_imm:
            return (True, inst.Op2.value)
        elif inst.Op2.type == idaapi.o_reg:
            return (True, regs[inst.Op2.reg])
        else:
            return (False, 0)
            
    ea = emu_start
    while ea < emu_end:
        out = ">%x: " % ea
        ok = True
        inst = idautils.DecodeInstruction(ea)
        ea += inst.size
        if inst.itype == idaapi.NN_not:
            out += "NOT"
            regs[inst.Op1.reg] = ~regs.get(inst.Op1.reg, 0) & 0xffffffff
        elif inst.itype == idaapi.NN_dec and inst.Op1.type == idaapi.o_reg:
            out += "DEC"        
            regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) - 1) & 0xffffffff
        elif inst.itype == idaapi.NN_inc and inst.Op1.type == idaapi.o_reg:
            out += "INC"        
            regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) + 1) & 0xffffffff
        elif inst.itype == idaapi.NN_xor:
            ok, val = get_opr_val(inst, regs)
            regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) ^ val) & 0xffffffff
            out += "XOR %08X" % val
        elif inst.itype == idaapi.NN_sub:
            ok, val = get_opr_val(inst, regs)
            regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) - val) & 0xffffffff
            out += "SUB %08X" % val
        elif inst.itype == idaapi.NN_add:
            ok, val = get_opr_val(inst, regs)
            regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) + val) & 0xffffffff
            out += "ADD %08X" % val
        else:
            ok = False

        # Dump registers
        for k, v in regs.items():
            out += (" [%s: %08X] " % (REG_NAMES.get(k, "%x" % k), v))

        if not ok:
            return (False, "Emulation failed at %08X" % ea)

        if dbg:            
            print(out)
    
    return (True, (regs[REG_EDX] << 32) | regs[REG_EAX])

When the function starts, it populates the regs dictionary with the initial values of the registers. We use the op.reg as the key into that dictionary. Any uninitialized register will contain the value zero. The emulation function then enters a loop and decodes each instruction. For each instruction, it inspects its type (to know what operation to emulate) and its operands (to know how to retrieve the needed values). At the end of the loop, a 64-bit value is returned.

We can verify if the emulator is correct by comparing the results returned from the emulator against the results we captured earlier:

for i in range(0, challenge_funcs_tbl_size):
    func = idc.Dword(challenge_funcs_tbl +  i * 4)
    
    ok, info = scope_challenge_function(func)
    if ok:
        ok, val = emulate_challenge_function(info, 123, 456, dbg)
        if (val != RESULTS[i]):
            print("Mistmatch #%d: %16X vs %16X" % (i, val, RESULTS[i]))
            break
        
    else:
        print("Failed to scope challenge function #%d" % i)

I hope you found this article useful. Please do not hesitate to ask question and/or point out mistakes in this article. You can download the files used in this article from here:

Password: 123

You might also like:

6 thoughts on “Writing a simple x86 emulator with IDAPython

  1. Hi. Thank you for the tutorial.

    I cannot test it, but shoudn’t the `if inst.Op2.type == o_imm:` be `if inst.Op2.type == idaapi.o_imm:` ?

  2. Hi Elias,

    Could you explain why is using idc.GetMnem() or idc.GetDisasm() inefficient ?
    I did come across instances where these APIs returned incorrect instructions.

    • Hi Mark, the reason I say inefficient is because unless you really want the instruction or its operands in textual form then why ask for a string when you can get what you need in numerical format (insn.itype, etc.)?

      If you want to query disassembled instructions in a busy loop, decoding is faster than emitting the disassembling text.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.