Reputation: 956
This is somewhat related to my post here.
I'm trying to implement matrix multiplication in Rust and I think in order to do this efficiently I need to be able to get the column data from the matrix. This is difficult as I'm storing my matrices in row-major format.
I'm using an unrolled dot product implementation which is detailed in the above link and also in my previous question. I want to be able to feed this method the row from one matrix and the column from another.
How can I efficiently select the column data from my matrix? And more generally: how can I select arbitrary data patterns (like in R, matlab, numpy etc.)?
I've tried:
Upvotes: 2
Views: 210
Reputation: 1280
If you use more clever for loops you get the answer for your question. I mean, you don't have to pull out a column from your matrix if you reorder your for loops. That way, you keep your CPU cache warm.
If your current algorithm looks like this:
// traditional multiplication
for i in 0..a_rows {
for j in 0..b_cols {
for k in 0..a_cols {
c[i][j] += a[i][k] * b[k][j];
}
}
}
You produce a lot of cache misses since b[k][j] doesn't access your data sequentially.
for i in 0..a_rows {
for k in 0..a_cols {
// Note, that j iterates over a column of B
for j in 0..b_cols {
c[i][j] += a[i][k] * b[k][j];
}
}
}
If you swap the two inner loops you iterate over B's column sequentially and utilize your cache. First, you will access to b[k][0]
, then b[k][1]
and so on. If an element is 4
byte, you can access the next 12
elements from the cache directly (as 64 byte is the most common L1 cache line size). The traditional approach doesn't use the cache so efficiently.
Upvotes: 1