Reputation: 3325
Is there an Intel SSE instruction which can load floats from (non contiguous) evenly spaced memory addresses?
For example given an array A = {0, 1, 2, 3 .... n}
, I would like to load into a 128 bit register at once {A[0], A[4], A[8], A[12]}
, followed by
{A[5], A[9], A[13], A[17]}
Upvotes: 2
Views: 461
Reputation: 212969
In this kind of use case you would typically load multiple contiguous vectors and then permute them into the required arrangements using e.g. pshufd
or punpckldq
etc.
Note that with AVX2 in Haswell and beyond there are gathered load instructions (e.g. _mm_i32gather_ps), which might also be worth considering.
Upvotes: 3