Reputation: 11
I am using Arm GNU Toolchain 12.2.Rel1 (Build arm-12.24)) 12.2.1 20221205 on Windows 11, and on compilation of a sequence of NEON instructions (vector multiplication by scalar):
vmull.u16 q7, d19, d0[0] vmull.u16 q7, d19, d8[0]
the first one compiles correctly but for the second one I get an error message:
ccuFDHko.s:5546: Error: scalar out of range for multiply instruction -- `vmull.u16 q7,d19,d8[0]'
After testing a couple more of combinations of different registers for the three parameters, I am inclined to conclude that only registers lower than d8 can be used for the scalar (third parameter).
I did not find any reference on this restriction on NEON Programmer's Guide nor on ARM site.
Also, when I used the intrinsic: uint32x4_t vmull_lane_u16(uint16x4_t vec1, uint16x4_t val2, __constrange(0, 3) int val3);
it always compiled with "d7[0]" as the scalar.
I appreciate getting any hints on this behavior or a reasonable explanation.
Thanks, Julio
Upvotes: 1
Views: 47