Reputation: 113
I want to load __m256
directly from Armadillo vector data with .memptr()
.
Does Armadillo ensure the data memory is 256-bits aligned? If it is then I would just convert the float/double pointer returned by .memptr()
to __m256
pointer and skip the _mm256_load_ps()
, if it makes sense in terms of performance.
Upvotes: 0
Views: 288
Reputation: 50348
The Armadillo do not seems to talk about this point in the documentation so it is left unspecified. Thus, vector data are likely not ensured to be 32-bytes aligned.
However, you do not need vector data to be aligned to load them in AVX registers: you can use the unaligned load intrinsic _mm256_loadu_ps
. AFAIK, the performance of _mm256_load_ps
and _mm256_loadu_ps
is about the same on relatively-new x86 processors.
Upvotes: 1