Reputation: 2847
So I have been having trouble with this toy example for learning to program with SSE intrinsics. I read on other threads here that sometimes segmentation faults with the _mm_load_ps function are caused by not aligning things right but I think it should be solved by the attribute((aligned(16))) thing that I did. Also, when I comment out either line 23 or 24 (or both) in my code the problem goes away but obviously this makes the code not work.
#include <iostream>
using namespace std;
int main()
{
float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
float temp3[8];
__m128 m, *m_result;
__m128 arr1 = _mm_load_ps(temp1);
__m128 arr2 = _mm_load_ps(temp2);
m = _mm_mul_ps(arr1, arr2);
*m_result = _mm_add_ps(m, m);
_mm_store_ps(temp3, *m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
m_result++;
arr1 = _mm_load_ps(temp1+4);
arr2 = _mm_load_ps(temp2+4);
m = _mm_mul_ps(arr1, arr2);
*m_result = _mm_add_ps(m,m);
_mm_store_ps(temp3, *m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
return 0;
}
Line 23 is arr1 = _mm_load_ps(temp1+4). It's weird to me that I can do one or the other but not both. Any help would be appreciated, thanks!
Upvotes: 3
Views: 3682
Reputation: 213059
(1) m_result
is just a wild pointer:
__m128 m, *m_result;
Change all occurrences of *m_result
to m_result
and get rid of the m_result++;
. (m_result
is just a temporary vector variable that you are subsequently storing to temp3
).
(2) Your two stores are potentially misaligned, since temp3
has no guaranteed alignment - either change:
float temp3[8];
to:
float temp3[8] __attribute__((__aligned__(16)));
or use _mm_storeu_ps
:
_mm_storeu_ps(temp3, m_result);
^^^
Upvotes: 3
Reputation: 33679
Your problem is that you declare a pointer __m128 *m_result
but you never allocate any space for it. Later you also do m_result++
which points to another memory address which has not been allocate. There is no reason to use a pointer here.
#include <xmmintrin.h> // SSE
#include <iostream>
using namespace std;
int main()
{
float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
float temp3[8];
__m128 m, m_result;
__m128 arr1 = _mm_load_ps(temp1);
__m128 arr2 = _mm_load_ps(temp2);
m = _mm_mul_ps(arr1, arr2);
m_result = _mm_add_ps(m, m);
_mm_store_ps(temp3, m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
arr1 = _mm_load_ps(temp1+4);
arr2 = _mm_load_ps(temp2+4);
m = _mm_mul_ps(arr1, arr2);
m_result = _mm_add_ps(m,m);
_mm_store_ps(temp3, m_result);
for(int i = 0; i < 4; i++)
{
cout << temp3[i] << endl;
}
return 0;
}
Upvotes: 6