agelos d
agelos d

Reputation: 11

qemu-arm branches to a seemingly abstract instruction

I am trying to build a binary translator for arm bear metal compiled code and I try to verify proper execution flow by comparing it to that of qemu-arm. I use the following command to dump the program flow: qemu-arm -d in_asm,cpu -singlestep -D a.flow a.out I noticed something strange, where the program seems to jump to an irrelevant instruction, since 0x000080b4 is not the branch nor the next instruction following 0x000093ec.

0x000093ec:  1afffff9      bne  0x93d8

R00=00000000 R01=00009c44 R02=00000002 R03=00000000
R04=00000001 R05=0001d028 R06=00000002 R07=00000000
R08=00000000 R09=00000000 R10=0001d024 R11=00000000
R12=f6ffed88 R13=f6ffed88 R14=000093e8 R15=000093ec
PSR=20000010 --C- A usr32
R00=00000000 R01=00009c44 R02=00000002 R03=00000000
R04=00000001 R05=0001d028 R06=00000002 R07=00000000
R08=00000000 R09=00000000 R10=0001d024 R11=00000000
R12=f6ffed88 R13=f6ffed88 R14=000093e8 R15=000093d8
PSR=20000010 --C- A usr32
----------------
IN: 
0x000080b4:  e59f3060      ldr  r3, [pc, #96]   ; 0x811c

The instruction that actually executes corresponds to the beggining of the <frame_dummy> tag in the disassembly. Can someone explain what actually happens within the emulator and is this behavior normal in the ARM architecture? The program was compiled with: arm-none-eabi-gcc --specs=rdimon.specs a.c

Here is the same segment of the program flow without the CPU state:

0x0000804c:  e59f3018      ldr  r3, [pc, #24]   ; 0x806c
0x00008050:  e3530000      cmp  r3, #0  ; 0x0
0x00008054:  01a0f00e      moveq    pc, lr

----------------
IN: __libc_init_array
0x000093e8:  e1560004      cmp  r6, r4
0x000093ec:  1afffff9      bne  0x93d8

----------------
IN: 
0x000080b4:  e59f3060      ldr  r3, [pc, #96]   ; 0x811c
0x000080b8:  e3530000      cmp  r3, #0  ; 0x0
0x000080bc:  0a000009      beq  0x80e8

This is the disassembly of this part:

93d4:   0a000005    beq 93f0 <__libc_init_array+0x68>
93d8:   e2844001    add r4, r4, #1
93dc:   e4953004    ldr r3, [r5], #4
93e0:   e1a0e00f    mov lr, pc
93e4:   e1a0f003    mov pc, r3
93e8:   e1560004    cmp r6, r4
93ec:   1afffff9    bne 93d8 <__libc_init_array+0x50>
93f0:   e8bd4070    pop {r4, r5, r6, lr}

Upvotes: 1

Views: 202

Answers (1)

Jester
Jester

Reputation: 58762

It is a reverse jump to previously emitted TB, you don't even have to read that much back:

IN: __libc_init_array
0x000093d8:  e2844001      add  r4, r4, #1  ; 0x1
0x000093dc:  e4953004      ldr  r3, [r5], #4
0x000093e0:  e1a0e00f      mov  lr, pc
0x000093e4:  e1a0f003      mov  pc, r3

----------------
IN: register_fini
0x0000804c:  e59f3018      ldr  r3, [pc, #24]   ; 0x806c
0x00008050:  e3530000      cmp  r3, #0  ; 0x0
0x00008054:  01a0f00e      moveq    pc, lr

----------------
IN: __libc_init_array
0x000093e8:  e1560004      cmp  r6, r4
0x000093ec:  1afffff9      bne  0x93d8

So, qemu is not showing it again. Notice this loop is iterating function pointers, the first one points to register_fini and the second one to the magical 0x000080b4 address in question (no symbol for it). When this unnamed function conditionally returns with moveq pc, lr control is transferred back to __libc_init_array address 0x000093e8 which then determines that the array end has been reached and again just returns to its caller at 0x000093f0.

Upvotes: 5

Related Questions