user1494080
user1494080

Reputation: 2124

Conditional structures in SSE

I have some trouble with a "special" kind of conditional structure in SSE/C++. The following pseudo code illustrates what I want to do:

    for-loop ...
        // some SSE calculations
        __m128i a = ... // a contains four 32-bit ints
        __m128i b = ... // b contains four 32-bit ints

        if any of the four ints in a is less than its corresponding int in b
            vector.push_back(e.g. first component of a)

So I do quite a few SSE calculations and as the result of these calculations, I have two __m128i values, each containing four 32-bit integer. This part is working fine. But now I want to push something into a vector, if at least one of the four ints in a is less than the corresponding int in b. I have no idea how I can achieve this.

I know the _mm_cmplt_epi32 function, but so far I failed to use it to solve my specific problem.

EDIT:

Yeah, actually I'm searching for a clever solution. I have a solution, but that looks very, very strange.

for-loop ...
    // some SSE calculations
    __m128i a = ... // a contains four 32-bit ints
    __m128i b = ... // b contains four 32-bit ints

    long long i[2] __attribute__((aligned (16)));

    __m128i cmp = _mm_cmplt_epi32(a, b);
    _mm_store_si128(reinterpret_cast<__m128i*>(i), cmp);

       if(i[0] || i[1]) {
            vector.push_back(...)

I hope, there is a better way...

Upvotes: 1

Views: 1891

Answers (2)

Z boson
Z boson

Reputation: 33669

I did something similar to this to find prime numbers Finding lists of prime numbers with SIMD - SSE/AVX

This is only going to be useful with SSE if the result of the comparison is false most of the time. Otherwise you should just use scalar code. Let me try and lay out the code.

__m128i cmp = _mm_cmplt_epi32(a, b);
if(_mm_movemask_epi8(cmp)) {
    int out[4] __attribute__((aligned (16)));
    _mm_store_si128(out, _mm_and_si128(out, a));
    for(int i=0; i<4; i++) if(out[i]) vector.push_back(out[i]);

}

You could store the comparison instead of using the logical and. Additionally, you could mask the bytes in the move mask and skip the store. Either way you do it what really matters is that the movemask is zero most of the time otherwise SSE won't be helpful.

In my case a was a list of numbers I wanted to test to be prime and b was a list of divisors. Since I knew that most of the time the values of a were not prime this gave me a boost of about 3x (out of max 4x with SSE).

Upvotes: 1

beerboy
beerboy

Reputation: 1294

You want to use the _mm_movemask_ps function, which will return an appropriate bitmask which you can test:

cmp = _mm_cmplt_epi32(a, b);

if(_mm_movemask_ps(cmp))
{
    vector.push_back(...);
}

Documented here: http://msdn.microsoft.com/en-us/library/4490ys29%28v=vs.90%29.aspx

Upvotes: 4

Related Questions