Reputation: 561
We trying to do some SSE operations, however, at the end of add_sse function, we trying to read back the value just computed, it will give us a seg fault. BUT, if we just print the value in the for loop, the result is ok. Also it is ok to just read the element 0 in each array. read element 1 and beyond will cause seg fault.
Could any one help us to identify the problem? we tried everything, but still dont understand we there would be a seg fault. THanks
void main()
{
ResultCounter *c_sse=(ResultCounter *)memalign(16,sizeof(ResultCounter)*4);
resetCounter (c_sse); //initial struct to all 0
add_sse (1,2, 3,4, c_sse);
}
void add_sse (unsigned int first, unsigned int second, unsigned int third, unsigned int fourth, ResultCounter *c)
{
__attribute__((align(16))) int m_intarray[4] = {first, second, third,fourth};
__attribute__((align(16))) int m_Larray[4] = {c[0].L, c[1].L, c[2].L,c[3].L};
__attribute__((align(16))) int m_Marray[4] = {c[0].M, c[1].M, c[2].M,c[3].M};
__attribute__((align(16))) int m_Harray[4] = {c[0].H, c[1].H, c[2].H,c[3].H};
__m128i N = _mm_load_si128(&m_intarray[0]);
__m128i L = _mm_load_si128(&m_Larray[0]);
__m128i M = _mm_load_si128(&m_Marray[0]);
__m128i H = _mm_load_si128(&m_Harray[0]);
__m128i Lcarry = _mm_and_si128 (L, N);
L = _mm_xor_si128 (L, N);
__m128i Mcarry = _mm_and_si128 (M, Lcarry);
M = _mm_xor_si128 (M, Lcarry);
H = _mm_or_si128 (H,Mcarry);
_mm_store_si128(&m_Larray[0], L);
_mm_store_si128(&m_Marray[0], M);
_mm_store_si128(&m_Harray[0], H);
for(i = 0; i < 4; i++) {
//printf ("L:%d,addr=%u,M:%u,addr=%u,H:%u,addr=%u\n",m_Larray[i],&m_Larray[i],m_Marray[i],&m_Marray[i],m_Harray[i],&m_Harray[i]);
c[i].L=m_Larray[i];
c[i].M=m_Marray[i];
c[i].H=m_Harray[i];
}
}
//The struct used in main function.
typedef struct
{
unsigned int L;
unsigned int M;
unsigned int H;
} ResultCounter;
Upvotes: 0
Views: 285
Reputation: 213059
The problem is that the ResultCounter
struct is 12 bytes in size, so although the first element of your array, c[0]
is 16 byte aligned, the second element, c[1]
, is not. The quickest/easiest fix for now would be to add 4 bytes of padding to this struct, e.g. an additional unused int:
typedef struct
{
unsigned int L;
unsigned int M;
unsigned int H;
unsigned int unused;
} ResultCounter;
Upvotes: 3