Reputation: 3
I have code for ARM NEON armv7-a:
vst2.u8 {d1,d3}, [%1]!
I port it to aarch64 like that:
st2 {v1.8b,v3.8b},[%1],#16
and got an error: Error: invalid register list at operand 1 -- `st2 {v1.8b,v3.8b},[x1],#16'
In accordance with doc this is valid:
ST2 {Vt.<T>, Vt+2.<T>}, vaddr
I can't figure out the problem.
p.s. if i change it like
st2 {v1.8b,v2.8b},[%1],#16
the compiler doesn't break with error message
Upvotes: 0
Views: 837
Reputation: 2493
I am refering to the ARM a64 instruction set architecture here, which was last updated in 2018.
The first link in your comment was only about the aarch32 instruction set. The second link was about the aarch64 instruction set, but it's titled as iterim in the pdf title and was published 2011. The format
ST2 { <Vt>.<T>, <Vt+2>.<T> }, vaddr
is mentioned there (page 89), but this is not included in the current version.
ST2
In the current version, ST2
is coded for multiple data structures as follows (see page 1085):
┌───┬───┬──────────┬───┬───────┬──────┬────┬───────┬───────┐
│ 0 │ Q │ 00110010 │ I │ mmmmm │ 1000 │ ss │ nnnnn │ ttttt │
└───┴───┴──────────┴───┴───────┴──────┴────┴───────┴───────┘
Rm size Rn Rt
There are three types of offset the instruction can be used with:
No offset (Rm == 000000
and I == 0
):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
Immediate offset (Rm == 111111
and I == 1
):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
Register offset (Rm != 111111
and I == 1
):
ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
<imm>
is #16
or #32
here, regarding to Q
. Only the first register's index t
is saved in the encoding here. The second register's index is always calculated as t+1 mod 32
.
That's why you got the error: the registers must follow one another. There is simply not enough space to encode the second register separately. The two index registers already take up too much lot of space.
Wouldn't it be possible to encode the second register? In the case I == 0
, Rm
is set to 00000
, but that's just conventional. This register could be used for our purpose, but only in the case that no immediate or register offset is specified.
I also see the reason why the format with <Vt+2>
was not adopted from the draft: it can only be coded for this special case. The implementation would make the implementation of the chip more complex and simply not worthwhile.
Upvotes: 2