Reputation: 11

Use of LR and PC instructions in non-leaf and leaf functions epilogue

I am trying to learn assembly through the guide from azeria-labs.com

I have a question about the use of the LR register and the PC register in the epilogue of non-leaf and leaf functions.

In the snippet below they show the difference for the epilogue in these functions.

If i write a program in C and look at in GDB it will always use "pop {r11, pc} for a non-leaf function and "pop {r11}; bx lr" for a leaf function. Can anybody tell me why this is?

When i am in a leaf function. Does it for example make a difference if i use "bx lr" or "pop pc" to go back to the parent functions?

/* An epilogue of a leaf function */ 
pop    {r11}        
bx     lr           

/* An epilogue of a non-leaf function */
pop    {r11, pc}

Upvotes: 1

Answers (2)

artless-noise-bye-due2AI

Reputation: 22430

I am trying to learn assembly

I have a question about the use of the LR register and the PC register in the epilogue of non-leaf and leaf functions.

This is part of the beauty and pain of assembler. There are no rules for the use of anything. It is up to you to decide what is needed. Please see: ARM Link and frame pointer as it maybe helpful.

... it will always use pop {r11, pc} for a non-leaf function and pop {r11}; bx lr for a leaf function. Can anybody tell me why this is?

A 'C' compiler is different. It has rules called an ABI. The latest version is called AAPCS for arm or ATPCS for thumb. These rules exist so that different compilers can call each others functions.^note1 Ie, tools can operate. You can have this 'rule' in assembler or you can disregard it. Ie, if your goal is to interoperate with a compilers code, you need to follow that ABI rules.

Some of the rules say what needs to be pushed on the stack and how registers are used. The 'reason' that the leaf is different is that it is more efficient. Writing to a register lr is much faster than memory (push to the stack). When it is an non-leaf function, a function call there will destroy the existing lr and you would not return the right place afterwards, so LR is pushed to the stack to make things work.

When i am in a leaf function. Does it for example make a difference if i use "bx lr" or "pop pc" to go back to the parent functions?

The bx lr is faster than the pop pc because one uses memory and the other does not. Functionally they are the same. However, one common reason to use assembler is to be faster. You will functionally end up with the same execution path, it is just it will take longer; how much will depend on the memory system. It could be next to negligible for a Cortex-M with TCM or very high for Cortex-A CPUS.

The ARM uses register to pass parameters because this is faster than pushing parameters on the stack. Consider this code,

int foo(int a, int b, int c) {return a+b+c;}
int bar(int a) { return foo(a, 1, 2);}

Here is a possible ARM code ^note2,

  foo:
    pop {r0, r1}
    add r0,r0,r1   ; only two registers needed.
    pop {r1}
    add r0,r0,r1
    bx  lr

  bar:
   push lr
   push r0     ; notice we are only using one register?
   mov r0, #1
   push r0
   mov r0, #2
   push r0
   bl foo
   pop pc

This is not how any ARM compiler will do things. The convention is to use R0, R1, and R2 to pass the parameters. Because this is faster and actually produces less code. But either way achieves the same thing. Maybe,

  foo:
   add r0,r0,r1  ; a = a + b
   add r0,r0,r2  ; a = a + c
   bx  lr

  bar:
   push lr     ; a = a from caller of bar.
   mov r1, #1  ; b = 1
   mov r2, #2  ; c = 2
   bl foo
   pop pc

The lr is somewhat similar to the parameters. You could push the parameters on the stack or just leave them in a register. You could put the lr on the stack and then pop it off later or you can just leave it there. What should not be under-estimated is how much faster code can become when it uses registers as oppose to memory. Moving things around is generally a sign that assembler code is not optimal. The more mov, push and pop you have the slower your code is.

So generally quite a bit of thought went into the ABI to make it as fast as possible. The older APCS is slightly slower than the newer AAPCS, but they both work.

Note1: You will notice a difference between static and non static function if you turn up optimizations. This is because the compiler may ignore the ABI to be faster. Static functions can NOT be called by another compiler and don't need to interoperate.

Note2: In fact the CPU designers think a lot about the ABI and take into consideration how many registers. Too many registers and the opcodes will be big. Too few and there will be lots of memory used instead of registers.

Upvotes: 3

Thomas Jager

Reputation: 5265

In the leaf function, there are no other function calls which would modify the link register lr.

For a non-leaf function, the lr must be preserved, done here by pushing it to the stack (somewhere not shown, earlier in the function).

The epilogue of the non-leaf function could be rewritten:

pop    {r11, lr}
bx     lr

This is however one more instruction, and so it is slightly less efficient.

Upvotes: 0

Use of LR and PC instructions in non-leaf and leaf functions epilogue

Answers (2)

Related Questions