الطيب
الطيب

Reputation: 29

Extract function bytes from ELF binary file

I want to write a python script which extracts a function opcodes from an elf binary knowing its address e.g 0x437310 and size. How can I map this address to the corresponding offset in the binary file to start read from it ?

Using a hex editor I can figure-out that function at 0x437310 starts at offset 0x37310 in the hexdump.

How can I calculate this in a generic way, since the imagebase of a binary is not always the same.

any help will be appreciated

Upvotes: 3

Views: 2507

Answers (1)

ysdx
ysdx

Reputation: 9325

Let's say I want to extract the instructions of maybe_make_export_env from bash.

The first thing you want to do is find this symbol in the symbol table:

$ readelf -s /bin/bash
   Num:    Value          Size Type    Bind   Vis      Ndx Name
[...]
   216: 000000000043ed80    18 FUNC    GLOBAL DEFAULT   14 maybe_make_export_env
[...]

This gives us the address of the function in memory (0x43ed80) and its length (18).

We have the address in memory (in the process image). We now want to find the relevant address in the file. In order to do that we need to look at the program header table:

$ readelf -l /bin/bash
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000f3ad4 0x00000000000f3ad4  R E    200000
  LOAD           0x00000000000f3de0 0x00000000006f3de0 0x00000000006f3de0
                 0x0000000000008ea8 0x000000000000ea78  RW     200000
  DYNAMIC        0x00000000000f3df8 0x00000000006f3df8 0x00000000006f3df8
                 0x0000000000000200 0x0000000000000200  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x00000000000d8ab0 0x00000000004d8ab0 0x00000000004d8ab0
                 0x0000000000004094 0x0000000000004094  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x00000000000f3de0 0x00000000006f3de0 0x00000000006f3de0
                 0x0000000000000220 0x0000000000000220  R      1

We want to find in which PT_LOAD entry this address belongs (based on VirtAddr and MemSize). The first PT_LOAD entry range from 0x400000 to 0x400000 + 0xf3ad4 = 0x4f3ad4 (excluded) so the symbol belongs to this PT_LOAD entry.

We can find the location of the function in the file with: symbol_value - VirtAddr + Offset = 0x3ed80.

This is the relevant part of the file:

0003ed80: 8b05 3260 2b00 85c0 7406 e911 feff ff90  ..2`+...t.......
0003ed90: f3c3 0f1f 4000 662e 0f1f 8400 0000 0000  [email protected].........

We indeed have the same bytes as the one given by objdump -d /bin/bash:

000000000043ed80 <maybe_make_export_env@@Base>:
  43ed80:       8b 05 32 60 2b 00       mov    0x2b6032(%rip),%eax        # 6f4db8 <array_needs_making@@Base>
  43ed86:       85 c0                   test   %eax,%eax
  43ed88:       74 06                   je     43ed90 <maybe_make_export_env@@Base+0x10>
  43ed8a:       e9 11 fe ff ff          jmpq   43eba0 <bind_global_variable@@Base+0x60>
  43ed8f:       90                      nop
  43ed90:       f3 c3                   repz retq 
  43ed92:       0f 1f 40 00             nopl   0x0(%rax)
  43ed96:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  43ed9d:       00 00 00

Upvotes: 3

Related Questions