Reputation: 1683
i am trying to get rid of unaligned loads and stores for SSE instructions for my application by replacing the
_mm_loadu_ps()
by
_mm_load_ps()
and allocating memory with:
float *ptr = (float *) _mm_malloc(h*w*sizeof(float),16)
instead of:
float *ptr = (float *) malloc(h*w*sizeof(float))
However wehen i print the pointer addresses using:
printf("%p\n", &ptr)
I get output:
0x2521d20
0x2521d28
0x2521d30
0x2521d38
0x2521d40
0x2521d48
...
This is not 16-byte aligned, even though i used the _mm_malloc function? And when using the aligned load/store operations for the SSE instructions i yield a segmentation error since the data isn't 16-byte aligned.
Any ideas why it isn't aligned properly or any other ideas to fix this?
Thanks in advance!
Using the
printf("%p\n",ptr)
solved the problem with the memory alignment, the data is indeed properly aligned.
However i still get a segmentation fault when trying to do an aligned load/store on this data and i'm suspecting it's a pointer issue.
When allocating the memory:
contents* instance;
instance.values = (float *) _mm_malloc(h*w*sizeof(float),16);
I have a struct with:
typedef struct{
...
float** values;
...
}contents;
In the code i then execute in another function, with a pointer to contents passed as argument:
__m128 tmp = _mm_load_ps(&contents.values);
Do you guys see anything i am missing? Thanks for all the help so far :)
Upvotes: 4
Views: 3586
Reputation: 212929
Change:
printf("%p\n", &ptr)
to:
printf("%p\n", ptr)
It's the memory that ptr is pointing to that needs to be 16 byte aligned, not the actual pointer variable itself.
Upvotes: 4