Elliot Gorokhovsky
Elliot Gorokhovsky

Reputation: 3762

Generate FMOV without inline assembly

I want to:

This doesn't seem to be possible using ACLE intrinsics.

This is the closest I can get using intrinsics: https://godbolt.org/z/brjG6fe38

    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return svlastb_u64(svptrue_b64(), vec);

which Clang compiles to

foo(unsigned long, unsigned long):
        mov     z0.d, x0
        ptrue   p0.d
        mov     z1.d, x1
        bdep    z0.d, z0.d, z1.d
        lastb   x0, p0, z0.d
        ret

The compiler is able to replace dup with mov, which is great. However, it still generates lastb, which is completely wasteful since I only need the last 64 bits. An fmov would do just fine.

Am I missing something, or is this basic operation not supported by ACLE intrinsics?

Upvotes: 3

Views: 129

Answers (2)

Elliot Gorokhovsky
Elliot Gorokhovsky

Reputation: 3762

It turns out there is a portable solution, so the non-portable workaround from Peter Cordes is not necessary:

#include <arm_neon_sve_bridge.h>

uint64_t foo(uint64_t a, uint64_t b) {
    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return vgetq_lane_u64(svget_neonq_u64(vec), 0);
}

See https://github.com/ARM-software/acle/issues/374#issuecomment-2568181600 for more context.

Godbolt: https://godbolt.org/z/d69zjGMEE

Upvotes: 2

Peter Cordes
Peter Cordes

Reputation: 365457

GNU C native vector syntax allows indexing a vector with [].

return vec[0] compiles to fmov. I don't know SVE very well, and I haven't checked how Clang's <arm_sve.h> defines the vector types.

This is unlikely to be portable to other compilers, especially MSVC.

uint64_t foo(uint64_t a, uint64_t b) {
    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return vec[0];
}

Godbolt:

foo(unsigned long, unsigned long):
        mov     z0.d, x0
        mov     z1.d, x1
        bdep    z0.d, z0.d, z1.d
        fmov    x0, d0
        ret

Upvotes: 2

Related Questions