luowenfeng
luowenfeng

Reputation: 21

how to optimize a[i] = b[c[i]] with NEON

I got a very simple but big(n is large) loop here:

for (i=0; i<n; i++)
{
    dst[i] = src[table[i]];
}

I want to optimize it using NEON but I don't know how to deal with this part:src[table[i]]. Is it possible to optimize? If yes, how?

Upvotes: 0

Views: 126

Answers (1)

luowenfeng
luowenfeng

Reputation: 21

Thanks for @Paul R and his comment:

This is effectively a gathered load, and is not supported in NEON.See: stackoverflow.com/questions/11502332/…

Since it couldn't optimized by NEON, I tried OpenMP, and got a significant improvement. And the code is rather simple too:

#pragma omp parallel for
for (i=0; i<n; i++)
{
    dst[i] = src[table[i]];
}

Upvotes: 1

Related Questions