Torgon
Torgon

Reputation: 131

Does the ARM calling convention allow a function to not store LR to the stack?

As the title says, I'm having problems understanding the calling convention for the ARM architecture. In particular, I still struggle to know what happens with the LR register when you call a subroutine.

I think that the most obvious and safer way to treat LR register when you enter a subroutine is to store it into the stack but that behaviour doesn't appear in the documentation so I thought of the following example.

I'll write it in C because I think is easier to explain with that. Imagine you have only two functions

void function_1(void){
   //some code here
}

void function_2(void){
   //some code here
   function_1();
   //some code here
}

The way I would use the LR register inside of function_1 would be like I said before, I'd store its value inside the stack but if you see closer, function_1 doesn't call any other subroutine so that would be unnecessary.

Is it possible that when using an ARM compiler, that compiler would decide to not store LR into the stack?

I read about the calling standard in this web of infocenter

Upvotes: 4

Views: 936

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364458

The calling convention only defines what registers are call-preserved vs. call-clobbered, and where to find stack args.

It's 100% up to the function how it goes about making sure its return address is available somewhere when it's ready to return. The most trivial and efficient way to handle that is to just leave it in LR the whole time, in a leaf function. (A function that doesn't call others: it's a leaf in the call graph / tree).

Compilers in practice will usually just leave it in LR in leaf functions, even with optimization disabled. GCC for example sets up a frame pointer with optimization disabled, but still doesn't store/reload LR when it knows it didn't need so many scratch registers that it wanted to use LR.

Otherwise in non-leaf functions, normal compilers will typically store it to the stack, but if they wanted to they could for example save R4 to the stack and mov r4, lr, then restore LR and reload R4 when they're ready to return.

A non-rentrant / non-threadsafe function could in theory save its return address in static storage, if it wanted to.

Source and GCC8.2 -O2 -mapcs-frame output from Godbolt, forcing it to generate an APCS (ARM Procedure Call Standard) stack frame even when it's not needed. (It looks like it has a similar effect to -fno-omit-frame-pointer which is on by default with optimization.)

void function_1(void){
   //some code here
}
function_1:
    bx      lr     @ with or without -mapcs-frame
void unknown_func(void);   // not visible to the compiler; can't inline
void function_2(void){
   function_1();   // inlined, or IPA optimized as pure and not needing to be called.
   unknown_func(); // tailcall
   unknown_func();
}
function_2:              @@ Without -macps-frame
    push    {r4, lr}         @ save LR like you expected
    bl      unknown_func
    pop     {r4, lr}         @ around a call
    b       unknown_func     @ but then tailcall for the 2nd call.

or with APCS:

    mov     ip, sp
    push    {fp, ip, lr, pc}
    sub     fp, ip, #4
    bl      unknown_func
    sub     sp, fp, #12
    ldm     sp, {fp, sp, lr}
    b       unknown_func
int func3(void){
    unknown_func();
    return 1;               // prevent tailcall
}
func3:           @@ Without -macps-frame
    push    {r4, lr}
    bl      unknown_func
    mov     r0, #1
    pop     {r4, pc}

Or with APCS:

func3:
    mov     ip, sp
    push    {fp, ip, lr, pc}
    sub     fp, ip, #4
    bl      unknown_func
    mov     r0, #1
    ldmfd   sp, {fp, sp, pc}

Since thumb interworking isn't needed (with the default compile options), GCC will pop the saved-LR into PC instead of just back into LR for bx lr.

Pushing R4 along with LR keeps the stack aligned by 8, which IIRC is the default.

Upvotes: 7

Related Questions