techno
techno

Reputation: 6518

Improving the efficiency of Standard Matrix Multiplication Algorithm?

How can I improve the efficiency of standard matrix multiplication algorithm?

The main operation involved in this approach is: C[i][j]+=A[i][p]*B[p][j]

What can be done to improve the efficiency of the algorithm?

Upvotes: 1

Views: 1897

Answers (5)

AlexQueue
AlexQueue

Reputation: 6551

Well there's Strassen's Algorithm, which, depending on the size of your matrix, is ever so slightly faster than the standard algorithm that you listed. There are of course even faster algorithms, but they arent so simple to implement.

The standard algorithm is O(N^3), Strassen's algo is O(N^2.8), and Coppersmith-Winograd is O(N^2.3)

Upvotes: 0

Darren Engwirda
Darren Engwirda

Reputation: 7015

You might want to have a look at using a BLAS (Basic Linear Algebra Subroutine) library, specifically Intel offer their MKL here, AMD have their ACML here and there's also the (opensource) Goto BLAS here.

The (dense) matrix-matrix multiply kernel will be a ?GEMM call, where the ? indicates the floating point type. For example DGEMM will call the double routine.

Unless you're extremely confident you know what you're doing with low-level optimisations, these libraries will probably offer better performance than something you can code by hand.

If you do want to have a go at coding this yourself then you may want to consider the following:

  1. Use "vector" instructions. SSE, SSE2..4 instructions are widely supported, some newer CPU's will also support AVX instructions.
  2. Nested loop unrolling to maximise the ratio of floating point operations to load/store operations.
  3. Block-wise algorithms to ensure effective cache use.
  4. Multi-threading.

This reference might give you an idea of the current state of things:

High-performance implementation of the level-3 BLAS - K Goto.

Hope this helps.

Upvotes: 1

Patrick87
Patrick87

Reputation: 28322

If the question concerns multiple matrix multiplications - M1 x M2 x ... x Mn - then there is another optimization technique based on dynamic programming, which is sort of another ball-game. Note that this doesn't apply to improving the efficiency of multiplying two matrices; however, if you are multiplying three or more matrices in a pairwise fashion, then you can optimize at an even higher level. Just thought I'd throw this answer on the heap to round out the information.

Upvotes: 0

tskuzzy
tskuzzy

Reputation: 36476

  1. Cache blocking - making sure you're properly using and reusing values in the cache
  2. Better algorithm - the "by-definition" way to multiply matrices is not optimal, take a look at Strassen's algorithm
  3. Parallelization - if your machine has more than one core and/or processor, you can divide and conquer
  4. SIMD - take advantage of SSE vector instructions in modern CPU architectures
  5. GPGPU - modern GPUs are optimized to do just this sort of thing. Look into CUDA and OpenCL.

Note that using these methods does not guarantee better performance. There is a lot of tuning required give a significant speedup. There is a lot of money going into figuring out how to multiply matrices quickly so there is no shortage of journal articles on the subject.

Upvotes: 1

NPE
NPE

Reputation: 500883

I would suggest reading Chapter 1 of Golub and Van Loan, which addresses this exact question.

Upvotes: 0

Related Questions