Bram
Bram

Reputation: 8283

How do I do AVX vector blending with clang native vector syntax (no intrinsics)?

To my delight, I found that clang will let you write explicit vector code, without resorting to intrinsics, using extended vectors.

For instance, this code:

typedef float floatx16 __attribute__((ext_vector_type(16)));

floatx16 add( floatx16 a, floatx16 b )
{
    return a+b;
}

...will translate directly to a single instruction with clang -march=skylake-avx512 invocation:

vaddps  zmm0, zmm0, zmm1

In order to write branch-free code, I want to blend avx512 vectors. With intrinsics, you would use the _mm512_mask_blend_ps intrinsic. (By the way, why is does AVX512 use mask,a,b order, and AVX use a,b,mask order?)

Trying to do the blend with the ternary operator does not work:

typedef float floatx16 __attribute__((ext_vector_type(16)));

floatx16 minimum( floatx16 a, floatx16 b )
{
    return a < b ? a : b;
}

...results in...

error: used type 'int __attribute__((ext_vector_type(16)))' (vector of 16 'int' values) where arithmetic or pointer type is required

Is it possible to do vector blending, vblendmps zmm {k}, zmm, zmm, using ext_vector_type(16) variables in C?

Upvotes: 3

Views: 867

Answers (1)

Bram
Bram

Reputation: 8283

(This is the comment by @chtz in answer-form:)

There are at least two different ways to do vector types:

Form A:

__attribute__ ( ( ext_vector_type(numelements) ) );

Form B:

__attribute__( ( vector_size(numbytes) ) );

When using form A, the expression c ? x : y will cause a compile error with clang 11.

Worse than that, gcc 10 will just silently pretend that ext_vector_type(N) has 4 elements even if N is 8 or 16.

When using form B, the expression c ? x : y is properly translated into a vector blend by clang 11. Clang 10 and gcc 10 translate it into something different though, but they are both able to compile it.

It is unclear to me why the ext_vector_type form exists, especially considering how badly it works.

UPDATE 1

Ugh... this only works in C++ but not in C. WHY???

UPDATE 2

The difference in behaviour is in the specification.

Upvotes: 3

Related Questions