shd
shd

Reputation: 1119

writing assembly code in c++

I have the following code in C++:

inline void armMultiply(const float* __restrict__ src1,
                        const float* __restrict__ src2,
                        float* __restrict__ dst)
{
    __asm volatile(
                 "vld1.f32 {q0}, [%[src1]:128]!      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
}

Why do I get the error vector register expected ?

Upvotes: 0

Views: 723

Answers (2)

mstorsjo
mstorsjo

Reputation: 13317

You're getting this error because your inline assembly is for 32 bit arm, but you're compiling for 64 bit arm (with clang - with gcc you would have gotten a different error).

(Inline) assembly is different between 32 and 64 bit arm, so you need to guard it with e.g. #if defined(__ARM_NEON__) && !defined(__aarch64__), or if you want to have different assembly for both 64 and 32 bit: #ifdef __aarch64__ .. #elif defined(__ARM_NEON__), etc.

As others commented, unless you really need to manually handtune the produced assembly, intrinsics can be just as good (and in some cases, better than what you produce yourself). You can e.g. do the two vld1_f32 calls, one vmul_f32 and one vst1_f32 via intrinsics just fine.

EDIT:

The corresponding inline assembly line for loading into a SIMD register on 64 bit would be:

"ld1 {v0.4s}, [%[src1]], #16      \n\t"

To support both, your function could look like this instead:

inline void armMultiply(const float* __restrict__ src1,
                        const float* __restrict__ src2,
                        float* __restrict__ dst)
{
#ifdef __aarch64__
    __asm volatile(
                 "ld1 {v0.4s}, [%[src1]], #16      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
#elif defined(__ARM_NEON__)
    __asm volatile(
                 "vld1.f32 {q0}, [%[src1]:128]!      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
#else
#error this requires neon
#endif
}

Upvotes: 1

Freddie Chopin
Freddie Chopin

Reputation: 8860

Assuming we're talking about GCC, the docs say that you should be using "w" ("Floating point or SIMD vector register") instead of "r" ("register operand is allowed provided that it is in a general register") as the constraint.

https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Machine-Constraints.html#Machine-Constraints

https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Simple-Constraints.html#Simple-Constraints

Upvotes: 0

Related Questions