Reputation: 21
I got a very simple but big(n is large) loop here:
for (i=0; i<n; i++)
{
dst[i] = src[table[i]];
}
I want to optimize it using NEON but I don't know how to deal with this part:src[table[i]]
.
Is it possible to optimize? If yes, how?
Upvotes: 0
Views: 126
Reputation: 21
Thanks for @Paul R and his comment:
This is effectively a gathered load, and is not supported in NEON.See: stackoverflow.com/questions/11502332/…
Since it couldn't optimized by NEON, I tried OpenMP, and got a significant improvement. And the code is rather simple too:
#pragma omp parallel for
for (i=0; i<n; i++)
{
dst[i] = src[table[i]];
}
Upvotes: 1