Reputation: 8865

Cache performance of vectors, matrices and quaternions

I've noticed on a number of occasions in the past, C and C++ code that uses the following format for these structures:

class Vector3
{
    float components[3];
    //etc.
}

class Matrix4x4
{
    float components[16];
    //etc.
}

class Quaternion
{
    float components[4];
    //etc.
}

My question is, will this lead to any better cache performance than say, this:

class Quaternion
{
    float x;
    float y;
    float z;
    //etc.
}

...Since I'd assume the class members and functions are in contiguous memory space, anyway? I currently use the latter form because I find it more convenient (however I can also see the practical sense in the array form, since it allows one to treat axes as arbitrary dependant on the operation being performed).

Afer taking some advice from the respondents, I tested the difference and it is actually slower with the array -- I get about 3% difference in framerate. I implemented operator[] to wrap the array access inside the Vector3. Not sure if this has anything to do with it, but I doubt it since that should be inlined anyway. The only factor I could see was that I could no longer use a constructor initializer list on Vector3(x, y, z). However when I took the original version and changed it to no longer use constructor initialiser lists, it ran very marginally slower than before (less than 0.05%). No clue, but at least now I know the original approach was faster.

Upvotes: 5

Answers (3)

npclaudiu

Reputation: 2451

I am not sure if the compiler manages to optimize code better when using an array in this context (think at unions for example), but when using APIs like OpenGL, it can be an optimisation when calling functions like

void glVertex3fv(const GLfloat* v);

instead of calling

void glVertex3f(GLfloat x, GLfloat y, GLfloat z);

because, in the later case, each parameter is passed by value, whereas in the first example, only a pointer to the whole array is passed and the function can decide what to copy and when, this way reducing unnecessary copy operations.

Upvotes: 1

onit

Reputation: 6376

I imagine the performance difference from an optimization like this is minimal. I would say something like this falls into premature optimization for most code. However, if you plan to do vector processing over your structs, say by using CUDA, struct composition makes an important difference. Look at page 23 on this if interested: http://www.eecis.udel.edu/~mpellegr/eleg662-09s/li.pdf

Upvotes: 1

Björn Pollex

Reputation: 76876

These declarations are not equivalent with respect to memory layout.

class Quaternion
{
    float components[4];
    //etc.
}

The above guarantees that the elements are continuous in memory, while, if they are individual members like in your last example, the compiler is allowed to insert padding between them (for instance to align the members with certain address-patterns).

Whether or not this results in better or worse performance depends on your mostly on your compiler, so you'd have to profile it.

Upvotes: 3

Cache performance of vectors, matrices and quaternions

Answers (3)

Related Questions