Reputation: 21865
I'm trying to figure out a good Loop unrolling for multiplying two matrices .
For example if we wanted to Sum a NxN matrix :
void SumMatrix(int *M, int n, int *result)
{
int i,j;
*result = 0;
for (i=0; i<n; i++)
for (j=0; j<n; j++)
*result += M[j][i];
}
We can do this :
void SumMatrix(int *M, int n, int *result)
{
int i;
int size = n*n;
int last = size%8;
int acc1 = 0;
int acc2 = 0;
int *pEnd = M+size-last;
for (; M<pEnd; M+=8)
{
acc1 += (*M + *(M+1)) + (*(M+2) + *(M+3));
acc2 += (*(M+4) + *(M+5)) + (*(M+6) + *(M+7));
}
/* adding the last entries */
while (last--)
acc1 += *(M++);
*result = acc1+acc2;
}
But I've tried to find a (GOOD) way to multiply 2 matrices , however found none at the moment .
Remark : this is no homework task , I have an exam today and just thought about this question , I think it could be a fine question for an exam , don't you ?
I'd appreciate any help
Regards
Ron
Upvotes: 0
Views: 4296
Reputation: 448
Look how complex the ATLAS project is, which provides an optimized version of a BLAS library (based primarily on matrix multiplication). It should consider not only thread-level parallelism, but the memory hierarchy (not only unrolling, but cache tiling and register tiling, software pipelining and so forth). It's usually written by hands or optimized by the "auto-tuner", like ATLAS. In case you want to unravel thread-level parallelism, you'd better to use a "tiled algorithm" and spread the resulting tiles computation between your threads.
Upvotes: 0
Reputation: 7548
Most compilers will do the unrolling for you (you might need to turn on a flag, or set it to an optimization level - I believe -funroll-loops
does it for gcc).
Also, with your question, the fact that it is a 2D matrix doesn't matter, since you are adding all the numbers up. If you are limited to a single process/thread, adding the numbers up sequentially will be the fastest because that has optimal caching performance. You might get some benefit out of SSE or vector instructions; again, today's compilers can do these for you with such a simple problem.
Upvotes: 2