user1042840
user1042840

Reputation: 1945

Function called in a file without a prototype produce different results on ARM and x86-64

We have 3 files: main.c, lib.h and lib.c:

main.c:

#include <stdio.h>
#include <stdlib.h>

/* #include "lib.h" */

int main(void)
{
    printf("sizeof unsigned long long: %zu\n", sizeof(unsigned long long));
    printf("sizeof int: %zu\n", sizeof(int));
    unsigned long long slot = 0;
    int pon_off = 1;
    lib_fn(slot, pon_off);
    return EXIT_SUCCESS;
}

lib.h:

void lib_fn(unsigned slot, int pon_off);

lib.c:

#include <stdio.h>
#include <stdlib.h>

void lib_fn(unsigned slot, int pon_off)
{
    printf("slot: %d\n", slot);
    printf("pon_off: %d\n", pon_off);
    return;
}

Compile:

gcc -O2 -Wall -Wextra main.c lib.c

Run on ARM:

$ ./a.out
sizeof unsigned long long: 8
sizeof int: 4
slot: 0
pon_off: 0

Run on x86-64:

$ ./a.out
sizeof unsigned long long: 8
sizeof int: 4
slot: 0
pon_off: 1

As you see pon_off is 0 on ARM but 1 on x86-64. I guess it has something to do with arguments size as lib_fn() takes two ints that are together 8 bytes long and a single long long is 8 bytes long.

  1. Why is pon_off printed differently on ARM and x86-64?

  2. Does it have something to do with a calling convention?

Upvotes: 1

Views: 188

Answers (3)

Peter Cordes
Peter Cordes

Reputation: 364532

Does it have something to do with a calling convention?

Yes, it has everything to do with the calling convention / ABI.

On x86-64, the "natural" width of a function argument is 64 bits, and narrower integer args still use a whole "slot". (First 6 integer/pointer args and first 8 FP args in registers (SysV) or first 4 args (Windows), then stack).

On ARM, the register width (and "arg slot" minimum width on the stack) is 32 bits, and 64-bit integer args take two registers.

On 32-bit x86 (gcc -m32) you would see the same behaviour as 32-bit ARM. On AArch64, you would see the same behaviour as x86-64, because their calling conventions are all "normal" and don't pack separate narrow args into single registers. (x86-64 System V does pack struct members into up to 2 registers, though, instead of using a separate register per member!)

Having a minimum "arg slot" width that's equal to the register size is nearly universal, whether args are passed in registers or on the stack. This isn't necessarily the width of int, though: AVR (8-bit RISC microcontroller) has 16-bit int which takes two registers, but char / uint8_t args can be passed in a single register.


With a prototype, wider/narrower types are converted to what the callee expects, according to the types in the prototype. So obviously everything works.

Without a prototype, the type of the expression in the call determines how the arg is passed. unsigned long long slot takes the first 2 arg-passing registers in ARM's calling convention, where lib_fn expects to find its 2 integer args.

(The answer claiming everything is converted to int without a prototype is wrong. No prototype is equivalent to int lib_fn(...);, but printf still works with double and int64_t. Note that float is implicitly converted to double when passing to a variadic function, just like narrower integer types are up-converted to int, which is why %f is the format for double, and there is no format for float, unlike with scanf where you pass pointers. That's just how C is designed; there's no reason for it. But anyway, C requires that wider types are able to be passed as is to variadic functions, and all calling conventions accomodate that.)


BTW, other breakage is possible: Some implementations use a different calling convention for variadic (and thus unprototyped) functions than for normal functions.

For example, on Windows you can set some compilers to default to the _stdcall calling convention, where the callee pops the args from the stack. (i.e. with a ret 8 to do esp+=8 after popping the return address.) But obviously this calling convention isn't usable for variadic functions, so the default doesn't apply to them, and they would use _cdecl or something where the caller is responsible for cleaning up stack args, because only the caller knows for sure how many args they passed. Hopefully in this mode compilers would at least warn if not error for implicitly declared functions, because getting it wrong leads to a crash (stack pointing to the wrong place after a call).


Let's have a look at the asm for this case

For an introduction to reading compiler asm output, see How to remove "noise" from GCC/clang assembly output?, and especially Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”.

To make the asm as simple as possible, I removed the printing and put the code in a function that returns void. (This allows tail-call optimization where you jump to the function and it returns to your caller.) The only instructions in the compiler output are the arg setup and jumping to lib_fn.

#ifdef USE_PROTO
void lib_fn(unsigned slot, int pon_off);
#endif

void foo(void) {
    unsigned long long slot = 0;
    int pon_off = 1;
    lib_fn(slot, pon_off);
}

See the source+asm on the Godbolt compiler explorer, for ARM, x86-64, and x86-32 (-m32) with gcc 6.3. (I actually copied foo and renamed lib_fn so it would have no prototype in one version of the caller, instead of having 2 separate compiler windows for each architecture. In a more complex case, that would be handy because you can diff between compiler panes).

For x86-64, the output is basically the same with/without the prototype. Without, the caller has to zero al (using xor eax,eax to zero the whole RAX) to indicate that this variadic function call is passing no FP args in XMM registers. (In the Windows calling convention, you wouldn't have that because the Windows convention is optimized for variadic functions and simplicity of implementing them at the expense of normal functions.)

For ARM:

foo:                  @ no prototype
    mov     r2, #1    @ pon_off
    mov     r0, #0    @ slot low half
    mov     r1, #0    @ slot high half
    b       lib_fn_noproto


bar:                  @ with proto, u long long is converted to unsigned according to C rules, like the callee expects
    mov     r1, #1
    mov     r0, #0
    b       lib_fn

lib_fn is expecting slot in R0 and pon_off in R1.


Breaking x86-64

You'd have the same problem on x86-64 if you used unsigned __int128.

lib_fn_noproto((unsigned __int128)slot, pon_off);

compiles to:

    mov     edx, 1          # pon_off = EDX = 1
    xor     edi, edi        #  low half of slot = RDI = 0
    xor     esi, esi        # high half of slot = RSI = 0
    xor     eax, eax        # number of xmm register args = 0
    jmp     lib_fn_noproto

which breaks the calling convention in exactly the same way as for 32-bit ARM with a 64-bit arg taking the first 2 slots.

Upvotes: 4

nowaqq
nowaqq

Reputation: 271

This is because how x64-86 and ARM are passing arguments to functions (as Peter Cordes mentioned in his comment).

Please compare generated assembly on ARM and on x64-86:

  1. On ARM unsigned long long is stored in 2 registers and int in 1, on x86 both are stored in 64-bit registers.
  2. On ARM, function, when getting arguments, is reading single register for each argument making high and low part of 1st variable to be split into 2 arguments. 2nd argument passed is, in the end, omitted. On x64-86 it is still getting values from those two 64-bit registers.

Side note: on x64-86 only few starting function arguments are passed by registers, if there are more, next arguments are stored on stack.

Upvotes: 1

unalignedmemoryaccess
unalignedmemoryaccess

Reputation: 7441

If there is no function prototype and implicit declaration is used, compiler assumes that all parameters are of type int.

Looks like int is different on arm and on x64-86 architecutre.

Note that modifier %d can only be used with int parameter, for unsigned one use %u

Thats why there are warnings for you.

Upvotes: 0

Related Questions