PGOnTheGo
PGOnTheGo

Reputation: 805

Working with Intel SSE SIMD intrinsics

I have a question regarding the various arithmetic operations for Intel SSE intrinsics. what is the difference between doing a _mm_add_ps Vs. _mm_add_epi8/16/32? I want to make sure that my data is aligned at all times.

In a sample code when I do this:

 __m128 u1 = _mm_load_ps(&V[(i-1)]);

I get a segmentation fault. But when I do this:

 __m128 u1 = _mm_loadu_ps(&V[(i-1)]);

It works fine.

Since I want my data aligned i declared the array like this:

 posix_memalign((void**)&V, 16, dx*sizeof(float));

Can someone help explain this.

Upvotes: 0

Views: 1196

Answers (1)

Tony The Lion
Tony The Lion

Reputation: 63200

_mm_add_ps add floats together, where _mm_add_epi8/16/32 adds integers, which are not floating point numbers.

_mm_loadu_ps does not require your floats to be 16byte (128bit) aligned, whereas _mm_load_ps does require 16byte alignment.

So if you get a seg fault on the first one, your alignment is wrong.

On the posix_memalign page it says this:

The posix_memalign() function shall fail if:

[EINVAL] The value of the alignment parameter is not a power of two multiple of sizeof( void *).

I'm not sure that sizeof(float) == sizeof(void*) ?? Per this, it seems to be the same in C (on a 32bit system). Ok, a little trickery here, because the size of a pointer is normally the size of the CPU register width, 32bit or 64bit (8 bytes) depending on the system used, whereas a float would normally be 32bit (4 bytes)

Your aligned allocation should look more like this:

posix_memalign((void**)&V, 16, dx*sizeof(void*)); //since it will the correct size for your platform.  You can always cast to `float` later on.

Upvotes: 4

Related Questions