liskawc
liskawc

Reputation: 85

algorithm for matrix matrix multiplication of size o(100)

While I realize this is a niche question, I am wondering if anyone knows of an algorithm for matrix matrix multiplication, that would be really great(meaning use a lot of flops of the cpu or possibly gpu) at matrices of sizes between 100x100 to 500x500?

While I know xgemm and xgemm3m are nice, unfortunately they get the big flops for matrices bigger than 1000x1000.

thanks for the help :)

Upvotes: 0

Views: 535

Answers (1)

High Performance Mark
High Performance Mark

Reputation: 78314

Not an answer but too long for a comment.

I think you are drawing the wrong conclusion from the Intel data. You seem to be thinking

Ah-ha, dgemm can zip along at 300GFLOP/s for large matrices, but only at a miserable 100GFLOP/s for small matrices -- where is the method which will multiply small matrices at 300GFLOP/s ?

I think along these lines

Ah-ha dgemm is most efficient on large arrays; hmm I wonder whether there are fixed costs attached to calling it which show up as comparatively poor performance on smaller job sizes. I expect that if there were faster algorithms for those small matrices the bright folk at Intel would have implemented them and made dgemm smart enough to select the right internal code-paths for any given problem size. After all, dense matrix multiplication is a key part of LINPACK which, for all its faults, is often used for benchmarking high-performance computers and Intel are highly-motivated to demonstrate the excellence of their machinery by using such benchmarks.

Now I'm not saying that you aren't as bright as the folk at Intel, and my train of thought may be defective, but I put it to you that you will struggle to write, or acquire, a code which outperforms dgemm on your small matrices on Intel hardware. I look forward to seeing the evidence that I am wrong on this.

Upvotes: 1

Related Questions