Reputation: 29
I have a body of code that I need to be faster, any tips or solutions would be useful. Looking to have a higher amount of cache hits by re-ordering memory accesses to be as sequential as possible.
int elij, veci, prod, curr, next;
for(int j=0; j<mat.cols; j++){
VSET(res,j,0); // initialize res[j] to zero
for(int i=0; i<mat.rows; i++){
elij = MGET(mat,i,j);
veci = VGET(vec,i);
prod = elij * veci;
curr = VGET(res,j);
next = curr + prod;
VSET(res,j, next); // add on the newest product
}
}
I was told that maybe switching cols and rows would be helpful because it avoids continually accessing memory that is farther away in cache, but im not sure how to do that.
This function is meant to perform a matrix transpose multiplied by a vector. any suggestions appreciated.
Upvotes: 1
Views: 80
Reputation: 14491
On surface, this is the classic vector my matrix multiplication. Any decent compiler will optimize the code, provided it can see thru the getters/setters: VGET, MGET, VSET).
If you MUST use the wrappers, best approach will be to use 'inline' on the getters, setters (assuming they are not macros). If they are not required, consider removing them as they do not contribute much, and using v[i], etc. This will free the power of the optimizer to use vectorization, etc.
Upvotes: 1