Reputation: 570
I have this issue, I wrote an OpenMP program which has to calculate the products of m
matrices. I want to give to each thread N
rows to process.
This is my code:
double val;
omp_set_num_threads(4);
for(i=0;i<m;i++){
#pragma omp parallel for private(f,c,k)
for(f=0;f<N;f++){ //cada thread trabaja con sus 2 filas asignadas
//printf("Thread %d, fila %d matriz %d \n",omp_get_thread_num(),f,i);
for(c=0;c<N;c++){ //cada fila trabaja con todas las columnas de la matriz principal
val=0;
for(k=0;k<N;k++){
/*if(k==0){
AUX[f*N+c]=RES[f*N+k]*A[i][k*N+c];
}*/
//else{
AUX[f*N+c]=val+RES[f*N+k]*A[i][k*N+c];
val=AUX[f*N+c];
//}
}
}
for(c=0;c<N;c++){
RES[f*N+c]=AUX[f*N+c];
}
}
}
The result is OK, but in performance a sequential algorithm is better...
I also made a Pthread solution and it works fine so I think I have some mistake when I parallelized the solution...
Upvotes: 1
Views: 371
Reputation: 570
I found a solution!, first, I didn't pay attention to the way I stored the data into the matrixes, and I had a lot of cache fails. So the RES Matrix stored by rows and the others by column.
Also, i put private the "val" variable. and the performance was improved.
Upvotes: 1