Captain Midnight
Captain Midnight

Reputation: 33

Function return value optimization?

I am writing my own programming language, which for various reasons, compiles to C. (one of which is that I know little-to-nothing about assembly).

I have a question about how a compiler (say GCC or Clang) might optimize returning a value from a function. Let's say I have code like this:

int FUNC()
{
    int A = 3;
    return A;
}

int main()
{
    int B = FUNC();
}

My understanding is that you'd expect the variable A to be copied to B upon return from FUNC (which could be expensive if A and B are structs). Would a compiler recognize that in this case B can just point to wherever A is and that a copy is not needed?

What if main() looks like this:?

int main()
{
    int C;
    C = FUNC();
}

Thank you!

Upvotes: 0

Views: 1214

Answers (3)

Eric Postpischil
Eric Postpischil

Reputation: 222332

Good compilers go beyond the optimizations you suggest. Apple Clang 11 with -O3 compiles your main routine to:

_main:
    pushq   %rbp
    movq    %rsp, %rbp
    xorl    %eax, %eax
    popq    %rbp
    retq

Thus the compiler has gone beyond your suggestions of coalescing A and B; it has removed them entirely.

If we include <stdio.h> and insert printf("%d\n", B); in main, it is compiled to:

_main:
    pushq   %rbp
    movq    %rsp, %rbp
    leaq    L_.str(%rip), %rdi
    movl    $3, %esi
    xorl    %eax, %eax
    callq   _printf
    xorl    %eax, %eax
    popq    %rbp
    retq

Now A and B are not completely removed, but they have been reduced to a single immediate constant in an instruction.

Upvotes: 0

Steve Summit
Steve Summit

Reputation: 47923

There are basically two cases. (1) The return value is something other than a struct, and (2) the return value is a struct.

For case (1), the return value is typically in a register -- it used to always be r0, or maybe f0 in the case of floating-point returns, or maybe r0+r1 in the case of long int returns.

In that case, when you have something like

int a()
{
    return 3;
}

int b()
{
    return a();
}

the compiler basically compiles b() to some code that calls function a, and that's it. Function a returns its value in whichever register int-valued functions return, and that's just where it needs to be for function b to return it, so there's nothing else to do; the value is already where it belongs. And therefore, at least in this case, there's no extra copying involved.

(This can also lead to situations where seemingly "wrong" code works anyway, and thus this Frequently-Asked Question: Function returns value without return statement?.)

But then, in your function main where you did int B = b(), then yes, there may be a "copy" from the return register to B's location. (Although, these days, a smart compiler may remember that "for now, r0 is B".)

For structs, on the other hand (that is, case 2), and especially for large ones, the compiler typically passes an extra, hidden argument which is a pointer to the location in the caller where the return value should go. That is, if you have

struct largestruct bb();

int main()
{
    struct largestruct B;
    B = bb();
}

it will be compiled more or less as if you had written

void bb(struct largestruct *);

int main()
{
    struct largestruct B;
    bb(&B);
}

So if you then have

extern struct largestruct aa();

struct largestruct bb()
{
    return aa();
}

It will probably be compiled as if it were written

extern void aa(struct largestruct *);

void bb(struct largestruct *__retp)
{
    aa(__retp);
}

And, again, it's more or less true that "the pointer points to the right place, and no copy is needed".

Upvotes: 2

Lundin
Lundin

Reputation: 213458

Generally, it is free to write things like:

int tmp = a + b;
int c = tmp;

tmp will get optimized out.

Your function will first of all get optimized similarly by not allocating A but just returning 3, taken from read-only memory (inlined together with the machine code).

Function inlining works similarly too, the whole function call in this case will get replaced by 3.

Now if we add a side effect like printing, printf("%d", B);, your whole code gets optimized to assembler like this "pseudo code assembler":

  • move 3 into register x
  • move address of the string literal "%d" into register y
  • call printf(x,y)

In real x86 asm that would be something like:

mov     esi, 3
mov     edi, OFFSET FLAT:.LC0
call    printf

The variables A B and the function are all optimized away.

Upvotes: 0

Related Questions