Reputation: 805
I have a question regarding the various arithmetic operations for Intel SSE intrinsics. what is the difference between doing a _mm_add_ps Vs. _mm_add_epi8/16/32? I want to make sure that my data is aligned at all times.
In a sample code when I do this:
__m128 u1 = _mm_load_ps(&V[(i-1)]);
I get a segmentation fault. But when I do this:
__m128 u1 = _mm_loadu_ps(&V[(i-1)]);
It works fine.
Since I want my data aligned i declared the array like this:
posix_memalign((void**)&V, 16, dx*sizeof(float));
Can someone help explain this.
Upvotes: 0
Views: 1196
Reputation: 63200
_mm_add_ps
add float
s together, where _mm_add_epi8/16/32
adds integers, which are not floating point numbers.
_mm_loadu_ps
does not require your floats to be 16byte (128bit) aligned, whereas _mm_load_ps
does require 16byte alignment.
So if you get a seg fault on the first one, your alignment is wrong.
On the posix_memalign
page it says this:
The posix_memalign() function shall fail if:
[EINVAL] The value of the alignment parameter is not a power of two multiple of sizeof( void *).
I'm not sure that sizeof(float)
== sizeof(void*)
??
Per this, it seems to be the same in C (on a 32bit system). Ok, a little trickery here, because the size of a pointer is normally the size of the CPU register width, 32bit or 64bit (8 bytes) depending on the system used, whereas a float
would normally be 32bit (4 bytes)
Your aligned allocation should look more like this:
posix_memalign((void**)&V, 16, dx*sizeof(void*)); //since it will the correct size for your platform. You can always cast to `float` later on.
Upvotes: 4