Reputation: 1579
In some code I have converted to SSE I preform some ray tracing, tracing 4 rays at a time using __m128 data types.
In the method where I determine which objects are hit first, I loop through all objects, test for intersection and create a mask representing which rays had an intersection earlier than previously found .
I also need to maintain data on the id of the objects which correspond to the best hit times. I do this by maintaining a __m128 data type called objectNo and I use the mask determined from the intersection times to update objectNo as follows:
objectNo = _mm_blendv_ps(objectNo,_mm_set1_ps((float)pobj->getID()),mask);
Where pobj->getID() will return an integer representing the id of the current object. Making this cast and using the blend seemed to be the most efficient way of updating the objectNo for all 4 rays.
After all intersections are tested I try to extract the objectNo's individually and use them to access an array to register the intersection. Most commonly I have tried this:
int o0 = _mm_extract_ps(objectNo, 0);
prv_noHits[o0]++;
However this crashes with EXC_BAD_ACCESS as extracting a float with value 1.0 converts to an int of value 1065353216.
How do I correctly unpack the __m128 into ints which can be used to index an array?
Upvotes: 3
Views: 3382
Reputation: 471279
There are two SSE2 conversion intrinsics which seem to do what you want:
_mm_cvtps_epi32()
_mm_cvttps_epi32()
These will convert 4 single-precision FP to 4 32-bit integers. The first one does it with rounding. The second one uses truncation.
So they can be used like this:
int o0 = _mm_extract_epi32(_mm_cvtps_epi32(objectNo), 0);
prv_noHits[o0]++;
EDIT : Based on what you're trying to do, I feel this can be better optimized as follows:
__m128i ids = _mm_set1_epi32(pobj->getID());
// The mask will need to change
objectNo = _mm_blend_epi16(objectNo,ids,mask);
int o0 = _mm_extract_epi32(objectNo, 0);
prv_noHits[o0]++;
This version gets rid of the unnecessary conversions. But you will need to use a different mask vector.
EDIT 2: Here's a way so that you won't have to change your mask:
__m128 ids = _mm_castsi128_ps(_mm_set1_epi32(pobj->getID()));
objectNo = _mm_blendv_ps(objectNo,ids,mask);
int o0 = _mm_extract_ps(objectNo, 0);
prv_noHits[o0]++;
Note that the _mm_castsi128_ps()
intrinsic doesn't map any instruction. It's just a bit-wise datatype conversion from __m128i
to __m128
to get around the "typeness" in C/C++.
Upvotes: 4