Loryan55
Loryan55

Reputation: 325

Using reference arguments in function definition: perfomance?

Is there any knowledge which one of two variants work faster, or they are the same, or it is incorrect to compare.

Vector test(Vector &vec)
{
 // return modified vector, or write directly to vec,
 // or do not return anything, but access vec anyway
}

Vector test(Vector vec)
{
 // same (but no reference)
}

I am asking because i should know it probably, to create best optimized code for Direct3D game.

UPDATE: I am talking about XMVECTOR from xnamath.h(d3d sdk) - 16 bytes, 4 floats.

Upvotes: 0

Views: 94

Answers (5)

Potatoswatter
Potatoswatter

Reputation: 137930

This isn't the sort of thing that is useful to generalize about.

Googling for XMVECTOR, I get

typedef __m128 XMVECTOR;

Therefore despite being 16 bytes, it's all one SSE machine register, so you should certainly pass this sucker by value. Taking a reference to something in a register only risks forcing it onto the stack.

EDIT: Even if you aren't using the above typedef, XMVECTOR may still be a special type treated differently by the compiler. Observe the notes about the XBox platform. In any case, what I say below counts doubly:


Treating micro-optimization as idiomatic is the wrong approach. Micro-optimization starts at the machine code. The starting point here should be whatever machine instructions the profiler points at, because there are so many tiny bits and pieces in any program that you won't find the slow part just by intuition.

If you are just getting started on your first optimization project, you should research different profiling tools (which tell you what part of the program is slow) and familiarize yourself with one. Once you drill down enough, when you can't improve speed by adjusting what the source code says to do, you will have to begin analyzing machine instructions. This requires familiarizing yourself with the details of your CPU and its instruction set. Only then can you usefully begin adjusting trivial differences in how the source code says to do small things.

If you don't know much about how your CPU executes instructions, don't jump to optimizing that sort of thing. It's a complete waste of time, considering that the big fish are in the algorithm and overall structure of the program.

Upvotes: 7

Bartek Banachewicz
Bartek Banachewicz

Reputation: 39390

Premature optimizations are the root of all evil.

It's mostly premature optimization. It's also a microoptimization. As such it requires more knowledge about Vector type and desired usage, your compiler, and a lot of other factors.

These two aren't also equal; the latter won't accept rvalues and will allow the vector to be changed by the function. You should use const& to make them really similar.

You said that it's a D3D app; in that case (except for precomputations), you really want to be doing vector and matrix calculations on your GPU. Simple profiler won't help with that, you need to profile both CPU and GPU code.

And as @Potatoswatter noticed, this is a type that your CPU will optimize more that it would if you passed it by reference.

Upvotes: 0

Mats Petersson
Mats Petersson

Reputation: 129524

Edit: See bottom for specifics on Vector that is 16 bytes long.

It is very likely that the first one is significantly faster if the vector has more than a few elements (or the elements are themselves quite large).

However, "the devil is in the detail" as they say. It's possible that, under some specific circumstances, the second case is indeed faster. That would be an exception rather than the rule, but it's still a possibility.

In the second case, the vector is being copied [unless the compiler can inline the code AND the compiler can realise what is going on, and remove the extra copy]. If the vector has 10000 elements, that's 10000 copies of whatever is in the vector.

In the first case, all that is passed from the calling function to the caller function is a single pointer. On the other hand, since it's a reference, the generated code would have to make one more memory reference to read the content. So if the vector is very small, and the test function is doing quite a few accesses to the vec variable, it is possible that the extra overhead of the indirection is "worse" than the copy of the content.

If in doubt, benchmark the two solutions.

Make sure that the benchmark is representative - you can get it equally wrong by making it 100x faster for 10k elements, and then end up with 2x slower when the number of elements is less than 20 - and the average is 11...

Edit: Since the question was updated, I have to add that "since Vector object is quite small", it's much less likely to be a significant difference between the choices. On a 32-bit system, the pass by reference option is likely to still have a small benefit [but, as I said in the above, it's balanced against more complex access to the Vector content]. On a 64-bit system, it's quite possible that passing two register values is faster than a reference.

Again, benchmark under "normal" type loads.

Upvotes: 1

user207421
user207421

Reputation: 311048

You should always pass objects by reference except when you need to pass an address, for example if you also want to allow a null pointer. Passing objects by value implies:

  1. Copying
  2. Object slicing

Neither of which you want to happen.

Upvotes: 0

Perfervor
Perfervor

Reputation: 9

A vector argument passed by reference would be faster, more so in case of a vector with many elements in it. That way you're simply avoiding the time spent in making a local copy.

Upvotes: 0

Related Questions