ianhobo
ianhobo

Reputation: 210

NEON inline assembly - store query

I am trying to learn how to utilize NEON using gcc and inline assembly. While it is confusing and slow going, I making some progress (It's been 10 years since I last tried writing assembly). My simple program loads a (small) vector, saturation sums it, and stores it. The problem I am having is that I cannot seem to store the result in the place I want. When I use an unused array pointer (r) in my output list, I get an error "impossible constraint in asm". If I then create a second pointer to it (rptr), it assembles, but it re-uses an input register r2 which is a, effectively overwriting the input. (I know my arrays are 32 elements in size and that I'm only processing one element, I plan to try to loop, or try load more registers for parallel processing next)

void vecSum()
{
    //two input arrays of 32 bit types, one output
    int32_t a[32];
    int32_t b[32];
    int32_t r[32];

    //initialize
    for(int cnt = 0; cnt < 32; cnt++)
    {
        a[cnt] = 0x33333333;
        b[cnt] = 0x11111111;
        r[cnt] = 0;
    }

    void *rptr = r;

    __asm__ volatile(
    "vld1.32 {d0},[%[ina]]!\n"  //load the neon register with our data at a, post increment the reg
    "vld1.32 {d1},[%[inb]]!\n"
    "vqadd.s32 d0,d1\n"        //perform the sat
    "vst1.32 d0,[%[result]]\n" //store the answer
    : [result]"=r" (rptr) /*r*/
    : [ina] "r" (a), [inb] "r" (b)
    : /*"d0", "d1", "d2"*/);

    for(int g=0; g < 32; g++)
    {
        printf("0x[%d]%x ",g,a[g]);
    }    

}

Objdump:

for(int cnt = 0; cnt < 32; cnt++)
 780:   e3530080    cmp r3, #128    ; 0x80
 784:   1afffff7    bne 768 <_Z8vecSum32v+0x28>
"vld1.32 {d1},[%[inb]]!\n"
"vqadd.s32 d0,d1\n" //perform the sat
"vst1.32 d0,[%[result]]\n"
: [result]"=r" (rptr)
: [ina] "r" (a), [inb] "r" (b)
: /*"d0", "d1", "d2"*/);
 788:   f422078f    vld1.32 {d0}, [r2]
 78c:   f421178d    vld1.32 {d1}, [r1]!
 790:   f2200011    vqadd.s32   d0, d0, d1
 794:   f402078f    vst1.32 {d0}, [r2]

In summary, if I try vst1.32 d0,[%[result]] where result is the array pointer r, I get a compilation error. If I rptr ( another pointer to r) it comiles, but uses r2 (the array a) as the output.

Can anybody explain why I get the error outputting to r? And why the ptr to r is a?

Upvotes: 0

Views: 1084

Answers (2)

Notlikethat
Notlikethat

Reputation: 20924

Consider if the asm contained add %[result], %[ina], %[inb]. There's no harm whatsoever in allocating r2 for both result and ina there. Since GCC doesn't go analysing the contents of the asm statement, its default assumption is that it contains a single instruction like that, so if yours is more complicated then you need to say so in order for things to work as expected.

Specifically, to prevent the problematic overlapping register allocation here, you need to be honest about the fact that you that your asm modifies the input registers - most simply via the + modifier (which then actually makes them outputs as far as GCC is concerned). Another unpleasant side effect of not doing that, is that the compiler would assume that e.g. r1 still holds the address of b afterwards, and may generate later code relying on that which will then go horribly wrong thanks to what the asm actually did.

Furthermore, you don't modify the result pointer, and only use its value as an input, so saying it's a write-only output operand is very wrong.

As for the issue with r, well, by specifying it as an output operand, you're saying that the asm writes a value back to that variable. Except you can't do that with an array variable in C (<languagelawyer> arrays are not modifiable lvalues) - you need to give the asm a variable which holds the address of the array and can be assigned back to, i.e. a pointer variable. The reason you can use the arrays directly as input operands, is because input operands are expressions, not variables, and an expression that evaluates to an array is automatically converted to a pointer to first element of that array (but is still not an lvalue </languagelawyer>).

All in all then, with appropriate pointer variables for a and b, suitable operands and constraints for this code as-is would look more like this:

: [ina] "+r" (aptr), [inb] "+r" (bptr)
: [result] "r" (r)
: "d0", "d1", "memory" /* getting clobbers right is also important */

Side note: if you just want to get to grips with NEON instructions rather than fighting with GCC, intrinsics are an alternative to consider.

Upvotes: 1

Timothy Baldwin
Timothy Baldwin

Reputation: 3675

rptr is declared as an output when it should be an input and "memory" is missing from the clobber list.

Alternatively you may put the arrays in structs and use the structs (rather than pointers) as arguments to the asm statement.

Upvotes: 1

Related Questions