Lvargas
Lvargas

Reputation: 3

Matrix Multiplication Using SSE

I am trying to get a working example of multiplying 2 matrix using SIMD because i need to compare the time of the algorithm with a "normal" one. Here is why i tried doing Efficient 4x4 matrix multiplication (C vs assembly) .

#include <xmmintrin.h>
#include <stdio.h>


void M4x4_SSE(float *A, float *B, float *C) {
    __m128 row1 = _mm_load_ps(&B[0]);
    __m128 row2 = _mm_load_ps(&B[4]);
    __m128 row3 = _mm_load_ps(&B[8]);
    __m128 row4 = _mm_load_ps(&B[12]);
    for(int i=0; i<4; i++) {
        __m128 brod1 = _mm_set1_ps(A[4*i + 0]);
        __m128 brod2 = _mm_set1_ps(A[4*i + 1]);
        __m128 brod3 = _mm_set1_ps(A[4*i + 2]);
        __m128 brod4 = _mm_set1_ps(A[4*i + 3]);
        __m128 row = _mm_add_ps(
                    _mm_add_ps(
                        _mm_mul_ps(brod1, row1),
                        _mm_mul_ps(brod2, row2)),
                    _mm_add_ps(
                        _mm_mul_ps(brod3, row3),
                        _mm_mul_ps(brod4, row4)));
        _mm_store_ps(&C[4*i], row);
    }
}


int main(){

  float A[4] __attribute__((aligned(16))) = {1,2,3,4};
  float B[4] __attribute__((aligned(16))) = {5,6,7,8};
  float C[4] __attribute__((aligned(16)));

  M4x4_SSE(A,B,C);

}

I am not familiar with c or c++ so it has been difficult, i get:

*** stack smashing detected ***: ./prueba terminated
Aborted (core dumped)

when i run my program. I need to scale to a 500x500 matrix at least. Thanks

Upvotes: 0

Views: 2993

Answers (1)

1201ProgramAlarm
1201ProgramAlarm

Reputation: 32727

The arrays you declare in main have 4 elements each, but your multiplication code reads and writes 16 elements each. Writing past the allocated space (elements 4 and later, in the second iteration of your i loop) will clobber the stack resulting in the error you see.

Upvotes: 4

Related Questions