Reputation: 2124
I have some trouble with a "special" kind of conditional structure in SSE/C++. The following pseudo code illustrates what I want to do:
for-loop ...
// some SSE calculations
__m128i a = ... // a contains four 32-bit ints
__m128i b = ... // b contains four 32-bit ints
if any of the four ints in a is less than its corresponding int in b
vector.push_back(e.g. first component of a)
So I do quite a few SSE calculations and as the result of these calculations, I have two __m128i values, each containing four 32-bit integer. This part is working fine. But now I want to push something into a vector, if at least one of the four ints in a
is less than the corresponding int in b
. I have no idea how I can achieve this.
I know the _mm_cmplt_epi32
function, but so far I failed to use it to solve my specific problem.
EDIT:
Yeah, actually I'm searching for a clever solution. I have a solution, but that looks very, very strange.
for-loop ...
// some SSE calculations
__m128i a = ... // a contains four 32-bit ints
__m128i b = ... // b contains four 32-bit ints
long long i[2] __attribute__((aligned (16)));
__m128i cmp = _mm_cmplt_epi32(a, b);
_mm_store_si128(reinterpret_cast<__m128i*>(i), cmp);
if(i[0] || i[1]) {
vector.push_back(...)
I hope, there is a better way...
Upvotes: 1
Views: 1891
Reputation: 33669
I did something similar to this to find prime numbers Finding lists of prime numbers with SIMD - SSE/AVX
This is only going to be useful with SSE if the result of the comparison is false most of the time. Otherwise you should just use scalar code. Let me try and lay out the code.
__m128i cmp = _mm_cmplt_epi32(a, b);
if(_mm_movemask_epi8(cmp)) {
int out[4] __attribute__((aligned (16)));
_mm_store_si128(out, _mm_and_si128(out, a));
for(int i=0; i<4; i++) if(out[i]) vector.push_back(out[i]);
}
You could store the comparison instead of using the logical and. Additionally, you could mask the bytes in the move mask and skip the store. Either way you do it what really matters is that the movemask is zero most of the time otherwise SSE won't be helpful.
In my case a
was a list of numbers I wanted to test to be prime and b
was a list of divisors. Since I knew that most of the time the values of a
were not prime this gave me a boost of about 3x (out of max 4x with SSE).
Upvotes: 1
Reputation: 1294
You want to use the _mm_movemask_ps
function, which will return an appropriate bitmask which you can test:
cmp = _mm_cmplt_epi32(a, b);
if(_mm_movemask_ps(cmp))
{
vector.push_back(...);
}
Documented here: http://msdn.microsoft.com/en-us/library/4490ys29%28v=vs.90%29.aspx
Upvotes: 4