Reputation: 147
I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want.
For example I have a source vector like this:
uint8x16_t source = 39 62 9b 52 34 5b 47 48 47 35 0 0 0 0 0 0
The destination vector is:
uint8x16_t destination = 0 0 0 0 0 0 0 0 0 0 0 0 c3 c8 c5 d5
I would like to have as an output this:
39 62 9b 52 34 5b 47 48 47 35 0 0 c3 c8 c5 d5
meaning that I want to copy the first ten bytes from the source and leave the other 6 unchanged. I'm using this mask:
{0,0,0,0,0,0,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};
What is the correct way to use the vbslq_u8?
Upvotes: 2
Views: 2596
Reputation: 212969
The ARM documentation is not very clear, but it looks like you would need to use the intrinsic like this:
uint8x16_t src = {0x39,0x62,0x9b,0x52,0x34,0x5b,0x47,0x48,
0x47,0x35,0x00,0x00,0x00,0x00,0x00,0x0};
uint8x16_t dest = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xc3,0xc8,0xc5,0xd5};
uint8x16_t mask = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00};
dest = vbslq_u8(mask, src, dest);
Note that order of bytes in the mask needs to correspond with the order in the source/dest registers (they seem to be swapped in your question ?).
Also note that the first param to the intrinsic appears to be the selection mask, where a 1 bit selects the corresponding bit from the second param and a 0 bit selects the corresponding bit from the third param.
Upvotes: 6