Reputation: 962
I would like to call an assembly function from C. It is part of a basic example for calling conventions.
The function is a basic:
int mult(int A, int B){
return A*B
}
According to the Procedure Call Standard for the
ARM® Architecture the parameters A
and B
should be in registers r0
and r1
respectively for the function call. The return value should be in r0
.
Essentially then I would expect the function to be:
EXPORT mult
mult MULT r0, r0, r1
BX lr
With GCC 7.2.1 (none) -O1 -mcpu=cortex-m4 -mabi=aapcs, I get the following: (using Compiler Explorer)
mult:
mul r0, r1, r0
bx lr
Which is what I expected. However. If I disable optimizations (-O0) I get the following nonsense:
mult:
push {r7}
sub sp, sp, #12
add r7, sp, #0
str r0, [r7, #4]
str r1, [r7]
ldr r3, [r7, #4]
ldr r2, [r7]
mul r3, r2, r3
mov r0, r3
adds r7, r7, #12
mov sp, r7
pop {r7}
bx lr
Which means GCC is using r7
as a frame pointer I think and passing all of the parameters and return values via the stack. Which is not according to the AAPCS.
Is this a bug with Compiler Explorer, GCC or have I missed something in the AAPCS? Why would -O0 have a fundamentally different calling convention than specified in the AAPCS document?
Upvotes: 0
Views: 281
Reputation: 22420
This is not due to debugging in my opinion. -O0
takes out optimization passes. As a result the compiler doesn't see everything fits in registers nor that you don't call other functions. Hence it will always make a stack frame which is r7
in thumb2 (Cortex-m4).
If you code a much more busy function you will see a stack frame at even -O3. See why compiler writers try to get rid of them? You have trouble understanding things, but it also a horrible amount of code. lto goes even further and would see that,
mov r0, xx # our call sight, might also have to save r0-r3.
mov r1, yy # because mult might trash those.
bl mult
...
mult:
mul r0, r1, r0
bx lr
Can be replaced by,
mul xx,yy,xx # one instruction!
It is quite common for call overhead to be as much as the actual function body. Other features like a macro, an inline keyword or attribute, etc. can achieve similar effects. Compilers are really good at allocating register and getting rid of mov
instructions. Your brain (or at least mine) is better at mapping high level problems to specific machine instructions, like clz
, addc
, etc. This is especially true if the higher level language doesn't have a way to denote what you want to do (use a carry, etc).
See also:
Upvotes: 1
Reputation: 6354
Don't bother analyzing machine codes compiled for the debug mode, because they follow some very obscured sequences that allows step by step execution by breakpoints while keeping all the global/local variables visible.
It isn't only pointless, but more confusing if what you want is learning assembly.
Go for -O2
or even -O3
all the time.
Upvotes: 3
Reputation: 962
Thanks to Marc Glisse for pointing out the obvious.
What is happening is that GCC is
r0
(A
) and r1
(B
) on the stack. Then; r2
and r3
.Then;r3
. Then;r3
into the return register r0
.This seems like it is actively trying to make things slower...
But it is still AAPCS.
My bad.
Thanks Marc
Edit:
As Jake 'Alquimista' LEE mentions this might make sense for debugging. All of the function values are available to the debugger on the stack.
Upvotes: 0