Reputation: 30645
I have an int array[10000] and I want to iterate from a certain position to find the next non-zero index. Currently I use a basic while loop:
while(array[i] == 0){
pos++;
}
etc
I know with intrinsics I could test 4 integers for zero at a time, but is there a way to return something indicating the vector index of the "first" non-zero?
Upvotes: 0
Views: 1566
Reputation: 213130
It's fairly simple to do this, but throughput improvement may not be great, since you will probably be limited by memory bandwidth (unless your array is already cached):
int index = -1;
for (i = 0; i < n; i += 4)
{
__m128i v = _mm_load_si128(&A[i]);
__m128i vcmp = _mm_cmpeq_epi32(v, _mm_setzero_si128());
int mask = _mm_movemask_epi8(vcmp);
if (mask != 0xffff)
{
break;
}
}
if (i < n)
{
for (j = i; j < i + 4; ++j)
{
if (A[j] != 0)
{
index = j;
break;
}
}
}
This assumes that the array A
is 16 byte aligned, its size, n
, is a multiple of 4, and that the ints are 32 bits.
Loop unrolling by a factor of 2 may help, particularly if your input data is large and/or sparse, e.g.
int index = -1;
for (i = 0; i < n; i += 8)
{
__m128i v0 = _mm_load_si128(&A[i]);
__m128i v1 = _mm_load_si128(&A[i + 4]);
__m128i vcmp0 = _mm_cmpeq_epi32(v0, _mm_setzero_si128());
__m128i vcmp1 = _mm_cmpeq_epi32(v1, _mm_setzero_si128());
int mask0 = _mm_movemask_epi8(vcmp0);
int mask1 = _mm_movemask_epi8(vcmp1);
if ((mask0 | mask1) != 0xffff)
{
break;
}
}
if (i < n)
{
for (j = i; j < i + 8; ++j)
{
if (A[j] != 0)
{
index = j;
break;
}
}
}
If you have AVX2 (Haswell and later) then you can process 8 ints at a time rather than 4.
Upvotes: 3