Reputation: 147
I'm trying to convert a piece of code in from SSE to ARM Neon for optimization. For most of the SSE instructions of the code I found some clearly equivalent Neon ones. I've got some problems with these though:
result1_shifted = _mm_srli_si128 (result1, 1);
result=_mm_packus_epi16 (res1,res2);
_mm_storeu_si128 (p_dest, result);
Could you please help me?
Upvotes: 3
Views: 6227
Reputation: 1998
I agree with the comments that it's probably a good idea to go back to a "C" (or anything really) reference design and maybe start from scratch. In particular you will find that perhaps NEON has some more optimal ways of doing things in some cases. But if you find that you need to do nearly identical things, here are some hints:
_mm_srli_si128 (result1, 1);
Try VEXT.S8 Qdst, Qsrc, Qsrc2, #1, where src2 has been cleared to 0.
_mm_packus_epi16 (res1,res2);
Try VQMOVN.S16 Ddst, Qsrc. The key word when looking for alternatives is "narrow." You are moving with narrowing. "Q" is NEON nomenclature for saturation. You may have an issue because you are doing signed to unsigned, which I'm not sure NEON supports, but your use case may be okay, but that's why having reference and tests is good!
_mm_storeu_si128 (__m128i *p, __m128i a);
Obviously there is VSTM and there are lots of options here. You probably want to look at this in some detail.
Upvotes: 3