Reputation: 2145
I want to implement bitwise extract vector instruction in ASIMD assembly instruction. Let me put it this way that in ARMv7 NEON instructions, suppose I have some values inside q15
and q11
, and I have:
"vext.8 d30, d30, d22, #4 \n\t"
"vext.8 d31, d31, d23, #4 \n\t"
As you can see here, I'm extracting 4-byte elements from the bottom end of d22
and 4-byte elements from the top end of d30
. Then I combine them into one 64-bit register d30
(first instruction above). The same operation is done on the second half of the q
vectors (d31
and d23
). Now, I want to implement the exact same logic in ARMv8 ASIMD instructions. The replacement instruction for vext
in ASIMD is ext
and it's defined as:
EXT Vd.(T), Vn.(T), Vm.(T), #index
Bitwise extract (vector). Where (T) is either 8B or 16B. The index is an immediate value in the range 0 to nelem((T))-1.
My question is, How can I use this instruction to construct the same logic in my two SIMD vector registers v15
and v11
for example.
Upvotes: 1
Views: 1123
Reputation: 822
Not sure if you found your answer or if this is your intended goal:
As @Dric512 indicated, I think you can use the SIMD Data Movement instruction, INS.
In the example below, we insert numbers 3 and 2 into vectors 10 and 11 within the 32-bit lanes, respectively.
We then insert the respective elements into both 32-bit lanes of vector 15; thereby resulting in all bits going into a 64-bit lane of vector 15.
mov w1, 3
mov w2, 2
ins v10.s[0], w1
ins v11.s[1], w2
ins v15.s[1], v11.s[1]
ins v15.s[0], v10.s[0]
Below are gdb results using p/t... to display $v10.s.s, $v11.s.s, $v15.s.s in base 2 then a final p/t $v15.d.s show the 64-bit bit pattern. I'm not sure if this helps but maybe it will prime the pump.
67 mov w1, 3
(gdb) si
$82 = {0, 0, 0, 0}
$83 = {0, 0, 0, 0}
$84 = {0, 0, 0, 0}
68 mov w2, 2
(gdb)
$85 = {0, 0, 0, 0}
$86 = {0, 0, 0, 0}
$87 = {0, 0, 0, 0}
70 ins v10.s[0], w1
(gdb)
$88 = {11, 0, 0, 0}
$89 = {0, 0, 0, 0}
$90 = {0, 0, 0, 0}
71 ins v11.s[1], w2
(gdb)
$91 = {11, 0, 0, 0}
$92 = {0, 10, 0, 0}
$93 = {0, 0, 0, 0}
73 ins v15.s[1], v11.s[1]
(gdb)
$94 = {11, 0, 0, 0}
$95 = {0, 10, 0, 0}
$96 = {0, 10, 0, 0}
74 ins v15.s[0], v10.s[0]
(gdb)
$97 = {11, 0, 0, 0}
$98 = {0, 10, 0, 0}
$99 = {11, 10, 0, 0}
.exit0 () at stuff.s:78
78 _exit
(gdb) p/t $v15.d.s
$100 = {1000000000000000000000000000000011, 0}
Upvotes: 0
Reputation: 3729
You should first note that in Aarch64, registers are not organised in the same way. In Aarch32, Q15 is {D31, D30}. In Aarch64, D31 is the bottom of Q31, which can be described as V31 when speaking about elements.
There is no direct equivalent in Aarch64 in this case because you cannot directly access the top 64-bit of the Quad registers, but I think you should be able to replace it with:
INS V15.S[0], V11.S[0]
INS V15.S[2], V11.S[2]
Ref: http://infocenter.arm.com/help/topic/com.arm.doc.dui0802b/INS_advsimd_elt_vector.html
Upvotes: 2