A23149577
A23149577

Reputation: 2155

Accessing half of a register in AArch64 advanced SIMD

I am new to AArch64 Advanced SIMD (NEON) and I want to port a AArch32 code to AArch64. In AArch32 if I wanted to access to lower or higher half of a register, I simply used Dn instead of Qn. For example if I want to access lower 64-bit of Q12, I simply referred to D24. However, I cannot figure out how can I access to half of a Vn register in AArch64. I would like to access the higher half of a Vn register. So, if I write Vn.2S, I assume it gives me the lower half of the register. Is that correct? If yes, how can I access the higher half then?

Upvotes: 1

Views: 1279

Answers (3)

zhiyuan
zhiyuan

Reputation: 1

According to Arm Architecture Reference Manual for A-profile architecture, it's impossible.

In the section E1.3.1.1 Advanced SIMD views of the register file, the document describes the registers under aarch32 as following:

Advanced SIMD can view this register file as:

• Sixteen 128-bit quadword registers, Q0-Q15.

• Thirty-two 64-bit doubleword registers, D0-D31.

These views can be used simultaneously. For example, a program might hold 64-bit vectors in D0 and D1 and a 128-bit vector in Q1.

However in the section B1.2 Registers in AArch64 Execution state, the document describes the registers under aarch64 as following:

32 SIMD&FP registers, V0 to V31. Each can be accessed as:

• A 128-bit register named Q0 to Q31.

• A 64-bit register named D0 to D31.

• A 32-bit register named S0 to S31.

• A 16-bit register named H0 to H31.

• An 8-bit register named B0 to B31.

• A 128-bit vector of elements. See SIMD vectors in AArch64 state.

• A 64-bit vector of elements. See SIMD vectors in AArch64 state.

Where the number of bits described by a register name does not occupy an entire SIMD&FP register, it refers to the least significant bits.

Upvotes: 0

Rauli Kumpulainen
Rauli Kumpulainen

Reputation: 336

I have successfully used pointers to select either the upper or lower half of an Arm Neon vector.

uint32x4_t vector = { 1, 2, 3, 4 };
uint32x2_t *upperhalf = (uint32x2_t *) &vector[2];
uint32x2_t *lowerhalf = (uint32x2_t *) &vector[0];

*lowerhalf = *upperhalf;
printf("%u", vector[0]);

Prints out 3. This is intrinsically telling the compiler to target either of the double register pairs that make up quad registers. It does not necessarily mean it will be reading or writing to memory when doing this. Instead it sees you want to target the double register directly.

This works with GCC 8, maybe older releases also. Clang 7 gave a "targeting vector..." error message. I have not been able to use the pointer to target indexes in the double register however using it as a regular vector of the datatype it is cast to, either as source or destination has always worked. Below is another example, byte swapping the vector half using the pointer.

*lowerhalf = vreinterpret_u32_u8(vrev32_u8(vreinterpret_u8_u32(*lowerhalf)));

It is not good practise to target uneven indexes as these overlap registers. I have not tried to see what that does but it will likely shuffle data around to temporary register lanes to complete an operation when doing so. Using pointers in this way has also worked when vectors are members of a struct.

Upvotes: 0

Even i tried accessing. As per the manual, i guess there is no way available to access slot vise. V0 -> d0 -> s0 has same data.

Whereas in ARM32, Q0 has d0 and d1 and further d0 has s0 and s1.

Upvotes: 0

Related Questions