Reputation: 29
I want to write a python script which extracts a function opcodes from an elf binary knowing its address e.g 0x437310 and size. How can I map this address to the corresponding offset in the binary file to start read from it ?
Using a hex editor I can figure-out that function at 0x437310 starts at offset 0x37310 in the hexdump.
How can I calculate this in a generic way, since the imagebase of a binary is not always the same.
any help will be appreciated
Upvotes: 3
Views: 2507
Reputation: 9325
Let's say I want to extract the instructions of maybe_make_export_env
from bash
.
The first thing you want to do is find this symbol in the symbol table:
$ readelf -s /bin/bash
Num: Value Size Type Bind Vis Ndx Name
[...]
216: 000000000043ed80 18 FUNC GLOBAL DEFAULT 14 maybe_make_export_env
[...]
This gives us the address of the function in memory (0x43ed80) and its length (18).
We have the address in memory (in the process image). We now want to find the relevant address in the file. In order to do that we need to look at the program header table:
$ readelf -l /bin/bash
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000f3ad4 0x00000000000f3ad4 R E 200000
LOAD 0x00000000000f3de0 0x00000000006f3de0 0x00000000006f3de0
0x0000000000008ea8 0x000000000000ea78 RW 200000
DYNAMIC 0x00000000000f3df8 0x00000000006f3df8 0x00000000006f3df8
0x0000000000000200 0x0000000000000200 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000d8ab0 0x00000000004d8ab0 0x00000000004d8ab0
0x0000000000004094 0x0000000000004094 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x00000000000f3de0 0x00000000006f3de0 0x00000000006f3de0
0x0000000000000220 0x0000000000000220 R 1
We want to find in which PT_LOAD
entry this address belongs (based on VirtAddr
and MemSize
). The first PT_LOAD
entry range from 0x400000
to 0x400000 + 0xf3ad4 = 0x4f3ad4
(excluded) so the symbol belongs to this PT_LOAD
entry.
We can find the location of the function in the file with: symbol_value - VirtAddr + Offset = 0x3ed80
.
This is the relevant part of the file:
0003ed80: 8b05 3260 2b00 85c0 7406 e911 feff ff90 ..2`+...t.......
0003ed90: f3c3 0f1f 4000 662e 0f1f 8400 0000 0000 [email protected].........
We indeed have the same bytes as the one given by objdump -d /bin/bash
:
000000000043ed80 <maybe_make_export_env@@Base>:
43ed80: 8b 05 32 60 2b 00 mov 0x2b6032(%rip),%eax # 6f4db8 <array_needs_making@@Base>
43ed86: 85 c0 test %eax,%eax
43ed88: 74 06 je 43ed90 <maybe_make_export_env@@Base+0x10>
43ed8a: e9 11 fe ff ff jmpq 43eba0 <bind_global_variable@@Base+0x60>
43ed8f: 90 nop
43ed90: f3 c3 repz retq
43ed92: 0f 1f 40 00 nopl 0x0(%rax)
43ed96: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
43ed9d: 00 00 00
Upvotes: 3