Eddie-Wang
Eddie-Wang

Reputation: 21

How to implement int16 table lookup in Neon?

I want to implement a instruction functions like shuffle_epi16 in Neon.

In AVX2 I figured it out by split int16 into two int8 and shuffle_epi8 twice than unpack_epi8 to merge them into the final int16 result. As for Neon, is there a better way to solve the shuffle_epi16? Or the only way is to replace shuffle_epi8 with vtbl1_s8 and unpack_epi8 with vuzp1q_s8?

Upvotes: 1

Views: 67

Answers (1)

solidpixel
solidpixel

Reputation: 12229

NEON doesn't have an arbitrary permute instruction that is a single intrinsic drop-in for the pshuf* family.

For a specific shuffle pattern you can often build it efficiently out pack/zip/transpose type operations. SSE2NEON has some good examples if you want an open-source reference:

A fixed code sequence isn't always efficient, and you may need something more generic using a run-time permute indices. For this NEON has the vtbl and vtbx instructions which let you use a value in registers to index into a small lookup table held in registers, giving an arbitrary shuffle. The nice thing with this is that the lookup table you index into can span multiple registers, so you can build tables bigger than 16 byte, but there is some overhead in setting up the table and it does consume registers.

Upvotes: 1

Related Questions