Reputation: 75
I am on this intel intrinsic guide page.
My sse experience is kind of brittle.
Ok, I have an array - a long one, really- of ints named 'source'.
I want to change some of its values if it match a certain value.
int source[] = {4,5,9,8}
int mask[] = {4,4,4,4}
int replacer[] = {3,3,3,3}
So the final source should look like {3,5,9,8}
I would like to achieve this using SSE < 4.
The closest instruction I came across is _mm_cmpeq_epi32
:
FOR j := 0 to 3
i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
Now I would like something to replace the original array with my value, or do nothing otherwise:
FOR j := 0 to 3
i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? my_mask_value_here : source_value_untouched
ENDFOR
Is there remotely something achieving what I am trying ? Ican't figure out even when combining different instructions..
Thanks
Upvotes: 1
Views: 592
Reputation: 58762
Having gotten your mask using the PCMPEQ
, if you have sse 4.1 then you can use the PBLENDVB
instruction which is specifically for this purpose. Otherwise, you can use PAND
, PANDN
and POR
to emulate it. Also, MASKMOVDQU
can be used.
Here is the source code demonstrating the 3 ways:
#include <stdio.h>
#include <x86intrin.h>
int main()
{
int source[] = {4,5,9,8};
int mask[] = {4,4,4,4};
int replacer[] = {3,3,3,3};
__m128i bitmask = _mm_cmpeq_epi32(*(__m128i*)source, *(__m128i*)mask);
// manual version
__m128i result = _mm_and_si128(*(__m128i*)replacer, bitmask);
__m128i tmp = _mm_andnot_si128(bitmask, *(__m128i*)source);
result = _mm_or_si128(result, tmp);
printf("%d %d %d %d\n", *(int*)&result, *((int*)&result + 1), *((int*)&result + 2), *((int*)&result + 3));
// maskmovdqu version
result = *(__m128i*)source;
_mm_maskmoveu_si128(*(__m128i*)replacer, bitmask, (char*)&result);
printf("%d %d %d %d\n", *(int*)&result, *((int*)&result + 1), *((int*)&result + 2), *((int*)&result + 3));
// sse 4.1 version
result = _mm_blendv_epi8(*(__m128i*)source, *(__m128i*)replacer, bitmask);
printf("%d %d %d %d\n", *(int*)&result, *((int*)&result + 1), *((int*)&result + 2), *((int*)&result + 3));
}
Upvotes: 4