scvyao
scvyao

Reputation: 141

Why a function's real address(ARM) is different from disasm?

I want to see a function's address in my code, so I write a hello world like this:

#include <stdio.h>

void myfn() {
  printf("I am myfn1\n");
  printf("I am myfn2\n");
  printf("I am myfn3\n");
  printf("I am myfn4\n");
  printf("I am myfn5\n");
}

typedef void (*MYFN)();

int main() {
  MYFN fn = (MYFN)myfn;
  printf("addr of fn: 0x%08X\n", (unsigned int)fn);
  fn();
  printf("just for %s\n", "test");
  return 0;
}

And the result is:

# ./test
addr of fn: 0x00008461
I am myfn1
I am myfn2
I am myfn3
I am myfn4
I am myfn5
just for test

So, the address of myfn is 0x00008461?

Then I use objdump to dump it:

84ae:   f7ff efb8   blx 8420 <printf@plt>
84b2:   f7ff ffd5   bl  8460 <printf@plt+0x40>
84b6:   4807        ldr r0, [pc, #28]   ; (84d4 <printf@plt+0xb4>)
84b8:   4907        ldr r1, [pc, #28]   ; (84d8 <printf@plt+0xb8>)
84ba:   4478        add r0, pc
84bc:   4479        add r1, pc
84be:   f7ff efb0   blx 8420 <printf@plt>

From that, the address of myfn is 0x8460? Near that:

8460:   480a        ldr r0, [pc, #40]   ; (848c <printf@plt+0x6c>)
8462:   b510        push    {r4, lr}
8464:   4478        add r0, pc
8466:   f7ff efd6   blx 8414 <puts@plt>
846a:   4809        ldr r0, [pc, #36]   ; (8490 <printf@plt+0x70>)
846c:   4478        add r0, pc
846e:   f7ff efd2   blx 8414 <puts@plt>
8472:   4808        ldr r0, [pc, #32]   ; (8494 <printf@plt+0x74>)
8474:   4478        add r0, pc
8476:   f7ff efce   blx 8414 <puts@plt>
847a:   4807        ldr r0, [pc, #28]   ; (8498 <printf@plt+0x78>)
847c:   4478        add r0, pc
847e:   f7ff efca   blx 8414 <puts@plt>
8482:   4806        ldr r0, [pc, #24]   ; (849c <printf@plt+0x7c>)

I wonder the real address is 0x8460, or 0x8461, or 0x8462? Please help me...

Upvotes: 1

Views: 454

Answers (1)

old_timer
old_timer

Reputation: 71546

This is thumb code. Read the ARM ARM and TRM (Architectural Reference Manual and Technical Reference Manual).

Specifically the BX and BLX instructions. When branching to code that is using thumb instructions (and/or thumb2 extensions), the bx or blx instruction is used, in particular here because the compiler doesnt know at compile time if the printf() function is thumb or arm mode so it has to encode using bx or blx, if it was branching to something being compiled at that time it could use the conditional branches for example. When using bx or blx the lsbit tells the instruction whether it is calling ARM instructions (the lsbit is zero) or thumb instructions (the lsbit is one). In thumb mode the program counter does not keep that lsbit set it is stripped by the bx/blx instruction.

The linker comes through and knows which functions are which mode and will fill in the appropriate addresses. So the function is in memory starting at address 0x8460, but to branch (call) using bx or blx you need to use the address 0x8461 because those are thumb mode instructions.

The compiler doesnt know why you need the address of the function, pretty much every where the linker needs to fill in the address it needs to control that lsbit based on mode, so apparently it is setting it to a one.

The address in question is 0x8460. If you have some reason for needing the real address not the call to address, just strip off the lsbit.

printf("addr of fn: 0x%08X\n", (unsigned int)(fn&(~1)));

Upvotes: 1

Related Questions