yuri
yuri

Reputation: 3

ARM NEON to aarch64

I have code for ARM NEON armv7-a:

vst2.u8   {d1,d3}, [%1]!

I port it to aarch64 like that:

st2 {v1.8b,v3.8b},[%1],#16

and got an error: Error: invalid register list at operand 1 -- `st2 {v1.8b,v3.8b},[x1],#16'

In accordance with doc this is valid:

ST2 {Vt.<T>, Vt+2.<T>}, vaddr 

I can't figure out the problem.

p.s. if i change it like

st2 {v1.8b,v2.8b},[%1],#16

the compiler doesn't break with error message

Upvotes: 0

Views: 837

Answers (1)

fcdt
fcdt

Reputation: 2493

I am refering to the ARM a64 instruction set architecture here, which was last updated in 2018.

The first link in your comment was only about the aarch32 instruction set. The second link was about the aarch64 instruction set, but it's titled as iterim in the pdf title and was published 2011. The format

ST2 { <Vt>.<T>, <Vt+2>.<T> }, vaddr

is mentioned there (page 89), but this is not included in the current version.

Encoding of ST2

In the current version, ST2 is coded for multiple data structures as follows (see page 1085):

┌───┬───┬──────────┬───┬───────┬──────┬────┬───────┬───────┐
│ 0 │ Q │ 00110010 │ I │ mmmmm │ 1000 │ ss │ nnnnn │ ttttt │
└───┴───┴──────────┴───┴───────┴──────┴────┴───────┴───────┘
                           Rm          size    Rn      Rt

There are three types of offset the instruction can be used with:

  • No offset (Rm == 000000 and I == 0):

    ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]
    
  • Immediate offset (Rm == 111111 and I == 1):

    ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>
    
  • Register offset (Rm != 111111 and I == 1):

    ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>
    

<imm> is #16 or #32 here, regarding to Q. Only the first register's index t is saved in the encoding here. The second register's index is always calculated as t+1 mod 32.

That's why you got the error: the registers must follow one another. There is simply not enough space to encode the second register separately. The two index registers already take up too much lot of space.

Consideration

Wouldn't it be possible to encode the second register? In the case I == 0, Rm is set to 00000, but that's just conventional. This register could be used for our purpose, but only in the case that no immediate or register offset is specified.

I also see the reason why the format with <Vt+2> was not adopted from the draft: it can only be coded for this special case. The implementation would make the implementation of the chip more complex and simply not worthwhile.

Upvotes: 2

Related Questions