Carol Victor
Carol Victor

Reputation: 331

Returning Vs. Pointer

How much would performance differ between these two situations?

int func(int a, int b) { return a + b; }

And

void func(int a, int b, int * c) { *c = a + b; }

Now, what if it's a struct?

typedef struct { int a; int b; char c; } my;

my func(int a, int b, char c) { my x; x.a = a; x.b = b; x.c = c; return x; }

And

void func(int a, int b, int c, my * x) { x->a = a; x->b = b; x->c = c; }

One thing I can think of is that a register cannot be used for this purpose, correct? Other than that, I am unaware of how this function would turn out after going trough a compiler.

Which would be more efficient and speedy?

Upvotes: 1

Views: 106

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364039

If the function can inline, often no difference between the first 2.

Otherwise (no inlining because of no link-time optimization) returning an int by value is more efficient because it's just a value in a register that can be used right away. Also, the caller didn't have to pass as many args, or find/make space to point at. If the caller does want to use the output value, it will have to reload it, introducing latency in the total dependency chain from inputs ready to output ready. (Store-forwarding latency is ~5 cycles on modern x86 CPUs, vs. 1 cycle latency for the lea eax, [rdi + rsi] that would implement that function for x86-64 System V.

The exception is maybe for rare cases where the caller isn't going to use the value, just wants it in memory at some address. Passing that address to the callee (in a register) so it can be used there means the caller doesn't have to keep that address anywhere that will survive across the function call.


For the struct version:

a register cannot be used for this purpose, correct?

No, for some calling conventions, small structs can be returned in registers.

x86-64 System V will return your my struct by value in the RDX:RAX register pair because it's less than 16 bytes and all integer. (And trivially copyable.) Try it on https://godbolt.org/z/x73cEh -

# clang11.0 -O3 for x86-64 SysV
func_val:
        shl     rsi, 32
        mov     eax, edi
        or      rax, rsi             # (uint64_t)b<<32 | a;  the low 64 bits of the struct
    # c was already in EDX, the low half of RDX; clang leaves it there.
        ret
func_out:
        mov     dword ptr [rcx], edi
        mov     dword ptr [rcx + 4], esi        # just store the struct members 
        mov     byte ptr [rcx + 8], dl          # to memory pointed-to by 4th arg
        ret

GCC doesn't assume that char c is correctly sign-extended to EDX the way clang does (unofficial ABI feature). GCC does a really dumb byte store / dword reload that creates a store-forwarding stall, to get uninitialized garbage from memory instead of from high bytes of EDX. Purely a missed optimization, but see it in https://godbolt.org/z/WGcqKc. It also insanely uses SSE2 to merge the two integers into a 64-bit value before doing a movq rax, xmm0, or to memory for the output-arg.

You definitely want the struct version to inline if the caller uses the values, so this packing into return-value registers can be optimized away.

How does function ACTUALLY return struct variable in C? has an ARM example for a larger struct: return by value passes a hidden pointer to the caller's return-value object. From there, it may need to be copied by the caller if assigning to something that escape analysis can't prove is private. (e.g. through some pointer). What prevents the usage of a function argument as hidden pointer?

Also related: Why is tailcall optimization not performed for types of class MEMORY?

How do C compilers implement functions that return large structures? points out that code-gen may differ between C and C++.

I don't know how to explain any general rule of thumb that one could apply without understand asm and the calling convention you care about. Usually pass/return large structs by reference, but for small structs it's very much "it depends".

Upvotes: 3

Related Questions