Reputation: 102245
I was looking at some ARM disassemblies for a few ARM dev-boards we test on. They were produced with NEON intrinsic vld1q_u32
using -march=armv7-a -mfloat-abi=hard -mfpu=neon
.
One one particular machine with NEON we see (/proc/cpuinfo half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
):
0: b5f0 push {r4, r5, r6, r7, lr}
...
20: f964 4a8f vld1.32 {d20-d21}, [r4]
On another NEON machine we see (/proc/cpuinfo : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
):
0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
...
28: f964 2a8f vld1.32 {d18-d19}, [r4]
And on a ARMv8 machine we see (/proc/cpuinfo : fp asimd evtstrm aes pmull sha1 sha2 crc32
):
0: 3dc00021 ldr q1, [x1]
...
10: 3dc00c22 ldr q2, [x1,#48]
14: 3dc01023 ldr q3, [x1,#64]
I understand the 2-D and 1-Q are simply different views of the same bank of registers. What I am not clear on is why ARMv7 NEON is performing the multiple register load instead of a 1Q load.
My question is, what is the difference between the vld1.32 {2-D}
and vld1q.32 1-Q
. Or why is the compiler not generating the 1-Q loads in all cases?
Upvotes: 2
Views: 1803
Reputation: 13317
The difference here lies between 32 bit ARM (aka AArch32) and AArch64.
The fact that 2 D registers are aliased over one Q register is true for 32 bit mode, but not in 64 bit mode. In AArch64, dX
is the first half of qX
, not of q(X/2)
as in AArch32, and there's no d
register name for addressing the upper half of the q
register.
If you, in AArch32, assemble the instruction vld1.32 {q0}, [r0]
, it will turn into the same opcode f920 0a8f
(in thumb mode) as you get if you assemble vld1.32 {d0-d1}, [r0]
. So it's basically up to the disassembler to choose which form it prefers to use for display (although there may be guidelines for disassemblers, saying it should prefer to use the D register form).
On AArch64, the two forms are distinct since the registers aren't aliased in the same way, so if you ask for a 128 bit load to a Q register, that's what you get and there's no ambiguity about it.
Upvotes: 4