Reputation: 2061
at the moment i am accessing my float values via a Union
typedef union
{
float v[4];
_mm128 m;
}SSEFloat;
but in this link i heared that the performane is loss. Is there a performane lose with the GCC 4 ? Does the float need to be aligned ? In the Union too ? Or is it correct to set the values like this
SSEFloat a;
float tmp = 10.0;
a.m = _mm_load1_ps( &tmp );
At the moment i couldnt find the Intel SSE Intrinsic Documentation too :( Is there a "small" list of - what to know for speed optimization ?
Upvotes: 1
Views: 1179
Reputation: 1910
If you use the floats in the union the compiler will probably output non-sse code for accessing them which will be a performance hit. It really depends on your object usage. You can add _MM_ALIGN16 (__declspec(align(16)) in front of a wrapper struct and override new and delete operators (if your are coding C++). Check this question: SSE, intrinsics, and alignment
Upvotes: 0
Reputation: 6776
The compiler will guarantee that the code will execute correctly, but it may sacrifice performance for correctness. Since the union is really only adding syntactic convenience for accessing the individual elements of a 4-item float vector, and the _mm128 object is (conceptually, if not actually) sitting in a register, I recommend you just use the _mm128 object directly and use the _mm_store_ps and _mm_load_ps family of APIs to move data in and out of the object.
Comments in the link you supplied suggest that the compiler can do poor optimization around the union, especially with _mm128s. If you want to be sure of this, you should do experiments both with and without the union. For high-resolution time measurement in Linux I recommend the pthread_getcpuclockid and clock_gettime APIs. Post your results if you can!
In general, for best performance, make things as easy and simple for the compiler as possible. This means keeping high-performance things like _mm128 out of complex structures like unions and instead just declare them on the stack or in memory allocated expressly for them.
Upvotes: 2