Reputation: 1
__m128d c1,c2,c3,c4,a1,a2,b1,b2;
int ida = 2;
for(int i = 0; i<n; i++) {
b1 = _mm_load_pd(b+i*n);
b2 = _mm_load_pd(b+i*n+ida);
for(int j = 0; j<n/2; j++) {
a1 = _mm_load_pd(a+i+j*2*n);
a2 = _mm_load_pd(a+i+j*2*n+n);
c1 = _mm_load_pd(c+j*2*n);
c2 = _mm_load_pd(c+j*2*n+n);
c3 = _mm_load_pd(c+j*2*n+ida);
c4 = _mm_load_pd(c+j*2*n+n+ida);
c1 = _mm_add_pd(c1, _mm_mul_pd(a1, b1));
c2 = _mm_add_pd(c2, _mm_mul_pd(a2, b1));
c3 = _mm_add_pd(c3, _mm_mul_pd(a1, b2));
c4 = _mm_add_pd(c4, _mm_mul_pd(a2, b2));
_mm_store_pd(c+j*2*n, c1);
_mm_store_pd(c+j*2*n+n, c2);
_mm_store_pd(c+j*2*n+ida, c3);
_mm_store_pd(c+j*2*n+n+ida, c4);
}
}
I have a segmentation error but I don't know why this error occurred.
The matrix is like:
a1 a2 a3 a4
a5 a6 ...
I want to make n*n matrix multiply.
Upvotes: 0
Views: 157
Reputation: 213160
It looks like your loads will be misaligned in at least some cases, e.g. when i = 1. Change all instances of _mm_load_pd_
to _mm_loadu_pd
in order to handle misaligned cases.
Upvotes: 1