Reputation: 5874
This might sound stupid but is there a way to activate support of the inner members of an SSE vector type ?
I know this works fine on MSVC, And I ve found some comments on forums and SO like this. The question, is can I activate this on CLang at least without creating my own unions ?
Thank you
[edit, workaround]
Currently I decided to create a vec4 type to help me. here is the code
#include <emmintrin.h>
#include <cstdint>
#ifdef _WIN32
typedef __m128 vec4;
typedef __m128i vec4i;
typedef __m128d vec4d;
#else
typedef union __declspec(align(16)) vec4{
float m128_f32[4];
uint64_t m128_u64[2];
int8_t m128_i8[16];
int16_t m128_i16[8];
int32_t m128_i32[4];
int64_t m128_i64[2];
uint8_t m128_u8[16];
uint16_t m128_u16[8];
uint32_t m128_u32[4];
} vec4;
typedef union __declspec(align(16)) vec4i{
uint64_t m128i_u64[2];
int8_t m128i_i8[16];
int16_t m128i_i16[8];
int32_t m128i_i32[4];
int64_t m128i_i64[2];
uint8_t m128i_u8[16];
uint16_t m128i_u16[8];
uint32_t m128i_u32[4];
} vec4i;
typedef union __declspec(align(16)) vec4d{
double m128d_f64[2];
} vec4d;
#endif
Upvotes: 0
Views: 1122
Reputation: 106197
On recent clangs, this Just Works without you needing to do anything at all:
#include <immintrin.h>
float foo(__m128 x) {
return x[1];
}
AFAIK it Just Works in recent GCC builds as well.
However, I should note the following:
Consider carefully whether or not you really need to do element-wise access in your vector code. If you can keep your operations in-lane, they will almost certainly be significantly more efficient.
If you really do need to do a significant number of lanewise or horizontal operations, and you don’t need portability, consider using Clang extended vectors (or “OpenCL vectors") instead of the basic SSE intrinsic types. You can pass them to intrinsics just like __m128
and friends, but they also have much nicer syntax for vector-scalar operations, lane wise operations, vector literals, etc.
Upvotes: 3