Reputation: 110
We can do matrix multiplication in several ways. A and B are two matrix, of size 1000 * 1000.
A*B
The result is
*
operator is very fast. It takes about 1 second. Can anyone explain the following. Why *
is so fast and why C loops is more efficient than write the loops in matlab.
I used another software , not matlab , to do the computing. But I think similar reasoning should apply.
Upvotes: 1
Views: 743
Reputation: 25058
There is a webpage that describes Matrix multiplication speeds in C using naive code and Blas code:
Naïve code
for (i = 0; i < n; i++)
for (j = 0; j < n; j++) {
C[i + j*n] = 0;
for (k = 0; k < n; k++)
C[i + j*n] += A[i + k*n]*B[k + n*j];
}
BLAS code
dgemm_(&charN, &charN, &n, &n, &n, &one_d, A, &n, B, &n, &zero_d, C, &n);
showing
Maybe a good option to use those libraries and improve your speed.
Upvotes: 1
Reputation: 973
matlab is an interpreted language, C is compiled. Therefore C loops are much faster than matlab loops. That explains the difference between 2 and 3.
Similarly, matlab's A*B is also a compiled code under the hood. Then why is it still an order of magnitude faster than the C code? After all, if it's just a nested code that's really simple, and the compiler can optimize it to be as fast as if you were writing it in assembly. Well, the answer is that it is most likely not just nested loops. The simple nested loop algorithm runs in O(n^3) time, but if you recursively break up the matrix then you can relatively easily get the O(n^2.8) Strassen algorithm (http://en.wikipedia.org/wiki/Strassen_algorithm) which performs well in practice; or you can even get down to the O(n^2.37) Coppersmith-Winograd algorithm (http://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm) which is not quite practical.
My bet is that matlab uses some form of the Strassen algorithm, which not only has a better asymptotic speed, but because the submatrices it eventially multiplies are small, it has a better cache utilization as well.
Upvotes: 2
Reputation: 272687
Because matrix multiplication is highly optimised in Matlab, using LAPACK libraries under the hood.
You will find it hard to beat the performance of those libraries. Certainly, simple nested loops don't take cache effects into account, and will thus exhibit terrible performance.
Upvotes: 3