Often times, when I stumble upon IDAPython scripts, I notice that they are using inefficient / incorrect IDAPython APIs to disassemble or decode instructions (for instance using idc.GetMnem() or idc.GetDisasm()). Therefore, in this blog post, I am going to illustrate how to use IDA’s instruction decoding functions from IDAPython in order to write a very simple x86 emulator. The goal is to demonstrate the correct use of instruction decoding IDAPython APIs. By the end of this post, you should be able to solve similar problems using IDAPython.
Table of Contents
Overview
Disassembling the program
Writing the emulator
Quick intro to instruction decoding
Scoping challenge functions
Emulating instructions
Overview
Let’s get started with the following sample program which contains a table of 12 challenge functions that get called in a loop and the result is displayed. Our goal is to write an emulator that can compute the challenge response value statically without using a 3rd party emulator.
#include <stdio.h> #include <cstdint> #include <time.h> #include <stdlib.h> typedef uint64_t (*challenge_proto_t)(uint32_t, uint32_t); //------------------------------------------------------------------------- uint64_t challenge_1(uint32_t c1, uint32_t c2) // 39 operation(s) { uint64_t result; __asm { pushad mov eax, [c1] mov edx, [c2] not eax dec edx xor edx, eax xor edx, eax inc eax not eax sub edx, 0x27c12466 inc eax dec edx not edx inc eax add eax, 0x273804ca xor edx, 0xaa5a1584 sub eax, edx not edx xor eax, 0xf94f7d8c dec edx dec eax sub eax, edx not edx dec edx sub edx, 0xd7b41b83 xor eax, 0xa551a9c7 add eax, eax dec eax inc eax not eax add edx, 0xa551b974 inc edx dec edx not edx xor eax, 0x200d519 not edx not eax sub edx, 0xeb15b7ef xor eax, 0xb2558b8c xor eax, 0xda288d90 not eax not edx mov dword ptr[result], eax mov dword ptr[result + 4], edx popad } return result; } // . // . // challenge_2 .. challenge_12 // . // . challenge_proto_t challenge_funcs[] = { challenge_1, challenge_2, challenge_3, challenge_4, challenge_5, challenge_6, challenge_7, challenge_8, challenge_9, challenge_10, challenge_11, challenge_12 }; //------------------------------------------------------------------------- int main(int argc, char *argv[]) { if (argc < 4) { printf("challenge func[0..%d] challenge1-32 challenge2-32 -> result-64\n", _countof(challenge_funcs)); return -1; } uint32_t f = atol(argv[1]) % _countof(challenge_funcs); uint32_t c1 = atol(argv[2]); uint32_t c2 = atol(argv[3]); printf("challenge_%d(%d, %d) -> %016I64X\n", f, c1, c2, challenge_funcs[f](c1, c2)); return 0; }
In real life, one can find him/herself in a similar situation wanting to treat a function as a blackbox instead of reversing it back into pseudo-code. In such cases, one has many choices:
- Use Appcall during runtime and query the challenge function directly.
- Use a decompiler to extract the algorithm in C pseudo-code and recompile it then use it.
- Extract the challenge function assembly listing, reassemble and use it.
- Use an emulation framework (the Unicorn engine for example) to emulate the challenge function.
Let’s compile this program and run it with c1=123 and c2=456:
C:\>for /l %a in (0, 1, 11) do @test.exe %a 123 456 challenge_0(123, 456) -> 8FDCE2E203FCAAF2 challenge_1(123, 456) -> E0317E1AB061ED8B challenge_2(123, 456) -> A0A0E0C2279BE734 challenge_3(123, 456) -> 5D18D0A79D07D7D8 challenge_4(123, 456) -> 2583B4EEB62E6042 challenge_5(123, 456) -> D5261E0275AB9805 challenge_6(123, 456) -> F2B3282E143F7927 challenge_7(123, 456) -> 9B9B3CBB0169F4CD challenge_8(123, 456) -> EF51086C5D1AF235 challenge_9(123, 456) -> FC8A97125C0EA232 challenge_10(123, 456) -> EEAE8BEB7996D2E7 challenge_11(123, 456) -> 4F36E6A65AB03929
Disassembling the program
Let’s disassemble the test program and locate the challenge_funcs table and the first challenge function:
; ; The challenge functions as referenced from main() ; (12 functions) ; .rdata:0041749C 00 10 40 00 challenge_funcs dd offset sub_401000 .rdata:0041749C ; DATA XREF: _main+57↑r .rdata:004174A0 90 10 40 00 dd offset sub_401090 .rdata:004174A4 30 11 40 00 dd offset sub_401130 .rdata:004174A8 D0 11 40 00 dd offset sub_4011D0 .rdata:004174AC 80 12 40 00 dd offset sub_401280 .rdata:004174B0 30 13 40 00 dd offset sub_401330 .rdata:004174B4 A0 13 40 00 dd offset sub_4013A0 .rdata:004174B8 30 14 40 00 dd offset sub_401430 .rdata:004174BC C0 14 40 00 dd offset sub_4014C0 .rdata:004174C0 30 15 40 00 dd offset sub_401530 .rdata:004174C4 E0 15 40 00 dd offset sub_4015E0 .rdata:004174C8 80 16 40 00 dd offset sub_401680 ; ; The first challenge function ; .text:00401000 ; int __cdecl sub_401000(int c1, int c2) .text:00401000 sub_401000 proc near .text:00401000 ; CODE XREF: _main+65↓p .text:00401000 ; DATA XREF: .rdata:challenge_funcs↓o .text:00401000 .text:00401000 var_8= dword ptr -8 .text:00401000 var_4= dword ptr -4 .text:00401000 c1= dword ptr 8 .text:00401000 c2= dword ptr 0Ch .text:00401000 .text:00401000 55 push ebp .text:00401001 8B EC mov ebp, esp .text:00401003 83 EC 08 sub esp, 8 .text:00401006 53 push ebx .text:00401007 56 push esi .text:00401008 57 push edi .text:00401009 60 pusha .text:0040100A 8B 45 08 mov eax, [ebp+8] .text:0040100D 8B 55 0C mov edx, [ebp+c2] .text:00401010 F7 D0 not eax .text:00401012 4A dec edx .text:00401013 33 D0 xor edx, eax .text:00401015 33 D0 xor edx, eax .text:00401017 40 inc eax .text:00401018 F7 D0 not eax .text:0040101A 81 EA 66 24 C1 27 sub edx, 27C12466h .text:00401020 40 inc eax .text:00401021 4A dec edx .text:00401022 F7 D2 not edx . . . .text:00401076 F7 D2 not edx .text:00401078 89 45 F8 mov [ebp+var_8], eax .text:0040107B 89 55 FC mov [ebp+var_4], edx .text:0040107E 61 popa .text:0040107F 8B 45 F8 mov eax, [ebp+var_8] .text:00401082 8B 55 FC mov edx, [ebp+var_4] .text:00401085 5F pop edi .text:00401086 5E pop esi .text:00401087 5B pop ebx .text:00401088 8B E5 mov esp, ebp .text:0040108A 5D pop ebp .text:0040108B C3 retn .text:0040108B .text:0040108B sub_401000 endp .text:0040108B
We could easily locate the challenge_funcs table because it is referenced from main(). The first challenge function, like all of the others, have a very distinct format/pattern on which we will base the emulator design.
We can distinctly see a pusha instruction (at 0x401009), followed by two instructions that load the initial values (at 0x40100A and 0x40100D), then a series of operations (between 0x401010 and 0x401076) on those registers and finally we see the results being copied back into local variables (at 0x401078 and 0x40107B) before popa is used to restore all registers.
We will use this code pattern to write a small function that identify the boundaries of the instructions that do the computation. We will then write another function that emulates code within a given range and returns the result.
Writing the emulator
In this section, we will implement two functions:
- scope_challenge_function(): this function finds the instructions boundaries to be emulated.
- emulate_challenge_function(): this function emulates instructions within a given range.
Before we get started, let’s define some global variables needed by the script:
import idc, idautils, idaapi challenge_funcs_tbl = 0x41749C challenge_funcs_tbl_size = 12 RESULTS = ( 0x8FDCE2E203FCAAF2, 0xE0317E1AB061ED8B, 0xA0A0E0C2279BE734, 0x5D18D0A79D07D7D8, 0x2583B4EEB62E6042, 0xD5261E0275AB9805, 0xF2B3282E143F7927, 0x9B9B3CBB0169F4CD, 0xEF51086C5D1AF235, 0xFC8A97125C0EA232, 0xEEAE8BEB7996D2E7, 0x4F36E6A65AB03929)
We deduced the challenge functions table and its size from the disassembly above. We also define the RESULTS variable containing the output of calling each challenge function with c1=123 and c2=456. We will use that table to verify the emulation after we are done.
To enumerate all the challenge functions in the table, we can do something like:
for i in range(0, challenge_funcs_tbl_size): func = idc.Dword(challenge_funcs_tbl + i * 4) print(">%x: challenge #%d" % (func, i + 1))
…and the output is:
>401000: challenge #1 >401090: challenge #2 >401130: challenge #3 >4011d0: challenge #4 >401280: challenge #5 >401330: challenge #6 >4013a0: challenge #7 >401430: challenge #8 >4014c0: challenge #9 >401530: challenge #10 >4015e0: challenge #11 >401680: challenge #12
Quick intro to instruction decoding
To decode instructions using IDAPython, use the idautils.DecodeInstruction() function:
# .text:0040101A 81 EA 66 24 C1 27 sub edx, 27C12466h inst = idautils.DecodeInstruction(0x40101A)
If decoding fails, then this function returns None. If decoding succeeds, we get an instruction object containing information about the instruction and its operands.
These are the important instruction attributes:
- inst.itype: this is an integer representing the instruction type. Different opcodes have the same itype and hence opcode != itype.
- inst.size: this is the size of the decoded instruction.
- inst.Operands[]: this is a zero based array containing operands information.
- inst.Op1 .. OpN: these are 1-based aliases into the Operands array.
- inst.ea: the linear address of the decoded instruction.
You might be wondering what is the relationship between an opcode and its itype? The answer is simple. In IDA, the open database’s processor module is responsible for filling the itype field based on the opcode. In the IDA SDK, you can find a header file called allins.hpp. This header file contains enums for all supported processor modules along with enum members for each supported instruction:
// Excerpt from allins.hpp // x86/x64 itypes enum { NN_null = 0, // Unknown Operation NN_aaa, // ASCII Adjust after Addition NN_aad, // ASCII Adjust AX before Division NN_aam, // ASCII Adjust AX after Multiply NN_aas, // ASCII Adjust AL after Subtraction . . . NN_jz, // Jump if Zero (ZF=1) NN_jmp, // Jump NN_jmpfi, // Indirect Far Jump NN_jmpni, // Indirect Near Jump NN_jmpshort, // Jump Short (not used) NN_lahf, // Load Flags into AH Register . . . // Pentium III Pseudo instructions NN_cmpeqps, // Packed Single-FP Compare EQ NN_cmpltps, // Packed Single-FP Compare LT NN_cmpleps, // Packed Single-FP Compare LE NN_cmpunordps, // Packed Single-FP Compare UNORD . . . }
I don’t know why, but the NN_ prefix is used for all the instruction types on the x86/x64 processor.
Here’s a simple example on how to decode and check the instruction type:
# .text:00402085 74 09 jz short loc_402090 inst = idautils.DecodeInstruction(0x402085) print("YES" if inst.itype == idaapi.NN_jz else "NO")
One can intuitively check what the decoded instruction is by comparing against one of the idaapi.NN_xxxx constants.
As for operands, one can access them via inst.Operands[] or inst.OpN. To get the number of operands used by the decoded instruction, you should not rely on the length of the Operands array because it will always resolve to UA_MAXOP == 8 (see ida.hpp). Instead, iterate over each operand and see if its type is o_void.
An instruction operand is defined using the op_t structure type defined in the ua.hpp header file.
Some of the operand members are:
- op.flags: operands flags.
- op.dtype: operand type. One of the dt_xxx consts. One can use this field to tell the size of the operand (1 == dt_byte, 2 == dt_word, etc.).
- op.type: operand type. One of the o_xxx consts.
- specflag1.. specflag4: processor specific flags.
These are the supported operand types (o_xxx):
- o_void: no operand present.
- o_reg: operand is a register (al, ax,es,ds…).
- o_mem: direct memory reference (DATA).
- o_phrase: memory Reference [Base Reg + Index Reg].
- o_displ: memory Reg [Base Reg + Index Reg + Displacement].
- o_imm: immediate value.
- o_far: immediate far Address (CODE).
- o_near: immediate near Address (CODE).
- o_idpspec0 .. o_idpspec5: processor specific flags.
There are additional operand members whose meaning differ based on the operand’s type:
- op.reg: the register number (o_reg).
- op.phrase: index register with memory accessing operands (o_phrase).
- op.value: the immediate value (o_imm) or the outer displacement (o_displ).
- op.addr: memory address used by the operand (o_mem, o_far, o_displ, o_near).
When the operand type is o_reg or o_phrase, then the op.reg/op.phrase values contain the register’s enum value. Like the NN_xxx terminology, the IDA SDK also provides the register constant names and their values; however this is only true for the x86/x64 processor module. Here’s an excerpt from the header file intel.hpp:
enum RegNo { R_ax = 0, R_cx, R_dx, R_bx, R_sp, R_bp, R_si, R_di, R_r8, R_r9, R_r10, R_r11, R_r12, R_r13, R_r14, R_r15, . . . }
Unfortunately though, those enums are not exposed to IDAPython, but at least we know enough to define something like the following:
REG_EAX = 0 REG_EDX = 2 REG_EBP = 5 . . . REG_NAMES = { REG_EAX: 'eax', REG_EDX: 'edx', REG_EBP: 'ebp' ...}
Here’s another example on how we can fully ‘disassemble’ an instruction:
# .text:0040106F 35 90 8D 28 DA xor eax, 0DA288D90h out = '' inst = idautils.DecodeInstruction(0x40106F) out += "XOR " if inst.itype == idaapi.NN_xor else "" out += "EAX" if (inst.Op1.type == idaapi.o_reg and inst.Op1.reg == 0) else "" out += ", 0x%08X" % inst.Op2.value if (inst.Op2.type == idaapi.o_imm) else "" print(out) # Outputs: XOR EAX, 0xDA288D90
That covers the instruction decoding principles. Please refer to header files intel.hpp, allins.hpp, ua.hpp and idp.hpp for more information.
Scoping the challenge functions
Earlier, we figured out how to go over the challenge functions table and retrieve the address of each challenge function. Let’s now write a function that uses the instruction decoder to find the boundaries of the instructions to be emulated.
Please note that I can use IDAPython’s FindBinary() but that defies the purpose of this article. For demonstration purposes, I want to find the code pattern in question using instruction decoding only:
def scope_challenge_function(func_ea): f = idaapi.get_func(func_ea) if f is None: return (False, "No function at address!") emu_start, emu_end = f.startEA, f.endEA ea = emu_start # # Find the start of the emulation pattern # stage = 0 while ea <= emu_end: inst = idautils.DecodeInstruction(ea) if inst is None: return (False, "Could not decode") # Advance to next instruction ea += inst.size # mov (eax|edx), [ebp+?] if (inst.itype == idaapi.NN_mov) and (inst.Operands[0].type == idaapi.o_reg) and \ (inst.Operands[1].type == idaapi.o_displ) and (inst.Operands[1].phrase == REG_EBP): # mov eax, [ebp+8] if (stage == 0) and (inst.Operands[0].reg == REG_EAX) and (inst.Operands[1].addr == 8): stage = 1 # mov edx, [ebp+0xC] elif (stage == 1) and (inst.Operands[0].reg == REG_EDX) and (inst.Operands[1].addr == 0xC): stage = 2 emu_start = ea elif (stage == 2) and (inst.itype == idaapi.NN_popa): # Let's decode backwards twice and double check the pattern ea2 = idc.PrevHead(ea) # Disassemble backwards for _ in range(0, 2): ea2 = idc.PrevHead(ea2) inst = idautils.DecodeInstruction(ea2) if (inst.itype == idaapi.NN_mov) and (inst.Op1.type == idaapi.o_displ) and \ (inst.Op1.reg == 5): if inst.Op2.reg == 2 and stage == 2: stage = 3 elif inst.Op2.reg == 0 and stage == 3: stage = 4 emu_end = ea2 break break if stage != 4: return (False, "Could not find markers") return (True, (emu_start, emu_end))
The basic pattern when decoding instructions is to advance the decoding address (the ea variable in this case) by inst.size after each successful decoding. Afterwards, one should check the instruction’s itype, then inspect the operands accordingly.
Note that at stage #2, I start decoding backwards. To go backwards in a proper disassembly listing, one can use the idc.PrevHead() function to retrieve the start address of the previously defined instruction (see line 37). Let’s test this function:
Python>ok, info = scope_challenge_function(0x401000) Python>if ok: print("start=%08X end=%08X" % (info[0], info[1])) start=00401010 end=00401078
Emulating instructions
In the previous step, we managed to retrieve a start and end address for the emulation boundaries. Now, let’s write a simple emulation function that only supports a limited set of instructions (NOT, DEC, INC, XOR, SUB and ADD):
def emulate_challenge_function(info, c1, c2, dbg = False): emu_start, emu_end = info if dbg: print("Emulating from %x to %x (%d, %d)" % (emu_start, emu_end, c1, c2)) # Reset registers regs = { REG_EAX: c1, REG_EDX: c2 } def get_opr_val(inst, regs): if inst.Op2.type == o_imm: return (True, inst.Op2.value) elif inst.Op2.type == idaapi.o_reg: return (True, regs[inst.Op2.reg]) else: return (False, 0) ea = emu_start while ea < emu_end: out = ">%x: " % ea ok = True inst = idautils.DecodeInstruction(ea) ea += inst.size if inst.itype == idaapi.NN_not: out += "NOT" regs[inst.Op1.reg] = ~regs.get(inst.Op1.reg, 0) & 0xffffffff elif inst.itype == idaapi.NN_dec and inst.Op1.type == idaapi.o_reg: out += "DEC" regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) - 1) & 0xffffffff elif inst.itype == idaapi.NN_inc and inst.Op1.type == idaapi.o_reg: out += "INC" regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) + 1) & 0xffffffff elif inst.itype == idaapi.NN_xor: ok, val = get_opr_val(inst, regs) regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) ^ val) & 0xffffffff out += "XOR %08X" % val elif inst.itype == idaapi.NN_sub: ok, val = get_opr_val(inst, regs) regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) - val) & 0xffffffff out += "SUB %08X" % val elif inst.itype == idaapi.NN_add: ok, val = get_opr_val(inst, regs) regs[inst.Op1.reg] = (regs.get(inst.Op1.reg, 0) + val) & 0xffffffff out += "ADD %08X" % val else: ok = False # Dump registers for k, v in regs.items(): out += (" [%s: %08X] " % (REG_NAMES.get(k, "%x" % k), v)) if not ok: return (False, "Emulation failed at %08X" % ea) if dbg: print(out) return (True, (regs[REG_EDX] << 32) | regs[REG_EAX])
When the function starts, it populates the regs dictionary with the initial values of the registers. We use the op.reg as the key into that dictionary. Any uninitialized register will contain the value zero. The emulation function then enters a loop and decodes each instruction. For each instruction, it inspects its type (to know what operation to emulate) and its operands (to know how to retrieve the needed values). At the end of the loop, a 64-bit value is returned.
We can verify if the emulator is correct by comparing the results returned from the emulator against the results we captured earlier:
for i in range(0, challenge_funcs_tbl_size): func = idc.Dword(challenge_funcs_tbl + i * 4) ok, info = scope_challenge_function(func) if ok: ok, val = emulate_challenge_function(info, 123, 456, dbg) if (val != RESULTS[i]): print("Mistmatch #%d: %16X vs %16X" % (i, val, RESULTS[i])) break else: print("Failed to scope challenge function #%d" % i)
I hope you found this article useful. Please do not hesitate to ask question and/or point out mistakes in this article. You can download the files used in this article from here:
You might also like:
Hi. Thank you for the tutorial.
I cannot test it, but shoudn’t the `if inst.Op2.type == o_imm:` be `if inst.Op2.type == idaapi.o_imm:` ?
Hi,
Thanks for the report. Yes that’s what I meant but the way it is works equally well.
idc.py also exposes that symbol and then init.py imports * from idc, so we are covered.
Understood, thank you!
Hi Elias,
Could you explain why is using idc.GetMnem() or idc.GetDisasm() inefficient ?
I did come across instances where these APIs returned incorrect instructions.
Hi Mark, the reason I say inefficient is because unless you really want the instruction or its operands in textual form then why ask for a string when you can get what you need in numerical format (insn.itype, etc.)?
If you want to query disassembled instructions in a busy loop, decoding is faster than emitting the disassembling text.
Thank you for the explanation. 🙂