Reputation: 3593
Given matrices A and P, I need to compute the "transpose-conjugate" (not sure what the term is)
X = P A Transpose(P)
I was thinking that the fastest way would be
for(int i=0;i<n;i++) {
for(int j=0;j<n;j++) {
for(int k=0;k<n;k++)
for(int l=0;l<n;l++) X[i][j]+=P[i][l]*A[l][k]*P[j][k];
}
}
}
However this is O(n^4), and I could also do it as two regular matrix multiplications, so twice O(n^3). Am I missing something here or should I stick with two multiplications
X = A Transpose(P)
X = P X
Upvotes: 4
Views: 773
Reputation: 33669
If you goal is to do this fast then you should not bother writing your own matrix multiplication algorithm: use a library such as Eigen. It's true that there are matrix multiplication algorithms with better asymptotic time complexity than O(n^3) but it's also true that many people put too much faith in asymptotic time complexity.
Also, from experience working with large matrices in scientific research they have been quite sparse so I think the practical cases for large dense matrix multiplication are fewer than for sparse matrix multiplication. The algorithms for sparse matrix multiplication are very different than for dense matrix multiplication.
To answer your question about multiplying three matrices you should do matrix multiplication twice but the order can matter. Look into Matrix_chain_multiplication. Matrix multiplication is associative. Let's use the example from wikipedia. A is a 10 × 30 matrix, B is a 30 × 5 matrix, and C is a 5 × 60 matrix. Then,
(AB)C = (10×30×5) + (10×5×60) = 1500 + 3000 = 4500 operations
A(BC) = (30×5×60) + (10×30×60) = 9000 + 18000 = 27000 operations.
When all the matrices have the same size (as in your question) it does not matter though.
If you plan to continue optimizing dense matrix multiplication on the CPU you will need to use loop tiling, SIMD, threading, and maybe assembly. After several weeks you can probably write something that competes with Eigen.
Upvotes: 3