Reputation: 131
As the title says, I'm having problems understanding the calling convention for the ARM architecture. In particular, I still struggle to know what happens with the LR register when you call a subroutine.
I think that the most obvious and safer way to treat LR register when you enter a subroutine is to store it into the stack but that behaviour doesn't appear in the documentation so I thought of the following example.
I'll write it in C because I think is easier to explain with that. Imagine you have only two functions
void function_1(void){
//some code here
}
void function_2(void){
//some code here
function_1();
//some code here
}
The way I would use the LR register inside of function_1
would be like I said before, I'd store its value inside the stack but if you see closer, function_1 doesn't call any other subroutine so that would be unnecessary.
Is it possible that when using an ARM compiler, that compiler would decide to not store LR into the stack?
I read about the calling standard in this web of infocenter
Upvotes: 4
Views: 936
Reputation: 364458
The calling convention only defines what registers are call-preserved vs. call-clobbered, and where to find stack args.
It's 100% up to the function how it goes about making sure its return address is available somewhere when it's ready to return. The most trivial and efficient way to handle that is to just leave it in LR the whole time, in a leaf function. (A function that doesn't call others: it's a leaf in the call graph / tree).
Compilers in practice will usually just leave it in LR in leaf functions, even with optimization disabled. GCC for example sets up a frame pointer with optimization disabled, but still doesn't store/reload LR when it knows it didn't need so many scratch registers that it wanted to use LR.
Otherwise in non-leaf functions, normal compilers will typically store it to the stack, but if they wanted to they could for example save R4 to the stack and mov r4, lr
, then restore LR and reload R4 when they're ready to return.
A non-rentrant / non-threadsafe function could in theory save its return address in static storage, if it wanted to.
Source and GCC8.2 -O2 -mapcs-frame
output from Godbolt, forcing it to generate an APCS (ARM Procedure Call Standard) stack frame even when it's not needed. (It looks like it has a similar effect to -fno-omit-frame-pointer
which is on by default with optimization.)
void function_1(void){
//some code here
}
function_1:
bx lr @ with or without -mapcs-frame
void unknown_func(void); // not visible to the compiler; can't inline
void function_2(void){
function_1(); // inlined, or IPA optimized as pure and not needing to be called.
unknown_func(); // tailcall
unknown_func();
}
function_2: @@ Without -macps-frame
push {r4, lr} @ save LR like you expected
bl unknown_func
pop {r4, lr} @ around a call
b unknown_func @ but then tailcall for the 2nd call.
or with APCS:
mov ip, sp
push {fp, ip, lr, pc}
sub fp, ip, #4
bl unknown_func
sub sp, fp, #12
ldm sp, {fp, sp, lr}
b unknown_func
int func3(void){
unknown_func();
return 1; // prevent tailcall
}
func3: @@ Without -macps-frame
push {r4, lr}
bl unknown_func
mov r0, #1
pop {r4, pc}
Or with APCS:
func3:
mov ip, sp
push {fp, ip, lr, pc}
sub fp, ip, #4
bl unknown_func
mov r0, #1
ldmfd sp, {fp, sp, pc}
Since thumb interworking isn't needed (with the default compile options), GCC will pop the saved-LR into PC instead of just back into LR for bx lr
.
Pushing R4 along with LR keeps the stack aligned by 8, which IIRC is the default.
Upvotes: 7