Reputation: 337

Access through reference overhead vs copy overhead

Let's say that I want to pass a POD object to function as a const argument. I know that for simple types like int and double passing by value is better than by const reference because of the reference overhead. But at what size it is worth it to pass as a reference?

struct arg
{
  ...
}

void foo(const arg input)
{
  // read from input
}

void foo(const arg& input)
{
  // read from input
}

i.e., at what size of struct arg should I start using the latter approach?

I should also mention that I'm not talking about copy elision here. Let's suppose that it doesn't happen.

Upvotes: 2

Answers (3)

Guillaume Racicot

Reputation: 41800

In addition to other responses, there is also optimization concerns.

Since it's a reference, the compiler cannot know if the reference point to a mutable global variable or not. When calling any function that the source is not available to the current TU, the compiler must assume the variable may have been mutated.

For example, if you have a if depending on a data member of Foo, call a function, then use the same data member, the compiler will be force to output two sparated loads, whereas if the variable is local, it knows it cannot be mutated elsewhere. Here's an example:

struct Foo {
    int data;
};

extern void use_data(int);

void bar(Foo const& foo) {
    int const& data = foo.data;

    // may mutate foo.data through a global Foo
    use_data(data);

    // must load foo.data again through the reference
    use_data(data);
}

If the variable is local, the compiler will simply reuse the value already inside the registers.

Here's a compiler explorer example that shows the optimization being applied only if the variable is local.

This is why the "general advise" will give you good performance, but won't give you optimal performance. You must mesure and profile your code if you truly care about the performance of your code.

Upvotes: 1

rustyx

Reputation: 85471

TL;DR: This depends highly on the target architecture, the compiler and the context in which the functions are invoked. When unsure, profile and manually inspect generated code.

If the functions are inlined, a good optimizing compiler will probably emit exact same code in both cases.

If the functions are not inlined however, the ABI on most C++ implementations dictate to pass a const& argument as a pointer. That means the structure has to be stored in RAM just so one can get an address of it. This can have a significant impact on performance for small objects.

Let's take x86_64 Linux G++ 8.2 as an example...

A struct with 2 members:

struct arg
{
    int a;
    long b;
};

int foo1(const arg input)
{
    return input.a + input.b;
}

int foo2(const arg& input)
{
    return input.a + input.b;
}

Generated assembly:

foo1(arg):
        lea     eax, [rdi+rsi]
        ret
foo2(arg const&):
        mov     eax, DWORD PTR [rdi]
        add     eax, DWORD PTR [rdi+8]
        ret

First version passes the structure entirely via registers, the second one via the stack..

Now let's try 3 members:

struct arg
{
    int a;
    long b;
    int c;
};

int foo1(const arg input)
{
    return input.a + input.b + input.c;
}

int foo2(const arg& input)
{
    return input.a + input.b + input.c;
}

Generated assembly:

foo1(arg):
        mov     eax, DWORD PTR [rsp+8]
        add     eax, DWORD PTR [rsp+16]
        add     eax, DWORD PTR [rsp+24]
        ret
foo2(arg const&):
        mov     eax, DWORD PTR [rdi]
        add     eax, DWORD PTR [rdi+8]
        add     eax, DWORD PTR [rdi+16]
        ret

Not a whole lot of difference anymore, although using the second version will still be a bit slower because it requires the address to be put in rdi.

Does it really matter that much?

Usually not. If you care about performance of a particular function, it's probably called frequently and is therefore small. As such, it will most likely be inlined.

Let's try invoking the two functions above:

int test(int x)
{
    arg a {x, x};
    return foo1(a) + foo2(a);
}

Generated assembly:

test(int):
        lea     eax, [0+rdi*4]
        ret

Voilà. It's all moot now. The compiler inlined and merged both functions into a single instruction!

Upvotes: 3

eerorika

Reputation: 238421

A reasonable rule of thumb: If the size of the class is same or less than size of a pointer, then copying may be a bit faster.

If the size of the class is slightly higher, then it may be hard to predict. The difference is often insignificant.

If the size of the class is humongous, then copying is likely slower. That said, point is moot since humongous objects can't in practice have automatic storage, since it is limited.

If the function is expanded inline, then there is probably no difference whatsoever.

To find out whether one program is faster than the other on a particular system, and whether the difference is significant in the first place, you can use a profiler.

Upvotes: 1

Access through reference overhead vs copy overhead

Answers (3)

Related Questions