Ricky
Ricky

Reputation: 1683

16 byte memory alignment using SSE instructions

i am trying to get rid of unaligned loads and stores for SSE instructions for my application by replacing the

_mm_loadu_ps()

by

_mm_load_ps()

and allocating memory with:

float *ptr = (float *) _mm_malloc(h*w*sizeof(float),16)

instead of:

float *ptr = (float *) malloc(h*w*sizeof(float))

However wehen i print the pointer addresses using:

printf("%p\n", &ptr)

I get output:

0x2521d20
0x2521d28
0x2521d30
0x2521d38
0x2521d40
0x2521d48
...

This is not 16-byte aligned, even though i used the _mm_malloc function? And when using the aligned load/store operations for the SSE instructions i yield a segmentation error since the data isn't 16-byte aligned.

Any ideas why it isn't aligned properly or any other ideas to fix this?

Thanks in advance!


Update

Using the

printf("%p\n",ptr)

solved the problem with the memory alignment, the data is indeed properly aligned.

However i still get a segmentation fault when trying to do an aligned load/store on this data and i'm suspecting it's a pointer issue.

When allocating the memory:

contents* instance;
instance.values = (float *) _mm_malloc(h*w*sizeof(float),16);    

I have a struct with:

typedef struct{
  ...
  float** values;
  ...
}contents;

In the code i then execute in another function, with a pointer to contents passed as argument:

__m128 tmp = _mm_load_ps(&contents.values);

Do you guys see anything i am missing? Thanks for all the help so far :)

Upvotes: 4

Views: 3586

Answers (1)

Paul R
Paul R

Reputation: 212929

Change:

printf("%p\n", &ptr)

to:

printf("%p\n", ptr)

It's the memory that ptr is pointing to that needs to be 16 byte aligned, not the actual pointer variable itself.

Upvotes: 4

Related Questions