Reputation: 1959
I am constructing the partial derivative of a function in C. The process is mainly consisted of a large number of small loops. Each loop is responsible for filling a column of the matrix. Because the size of the matrix is huge, the code should be written efficiently. I have a number of plans in mind for the implementation which I do not want get into the details.
I know that the smart compilers try to take advantage of the cache automatically. But I would like to know more the details of using cache and writing an efficient code and efficient loops. It is appreciated if provide with some resources or websites so I can know more about writing the efficient codes in terms of reducing memory access time and taking advantage guy.
I know that my request my look sloppy, but I am not a computer guy. I did some research but with no success. So, any help is appreciated.
Thanks
Upvotes: 2
Views: 357
Reputation: 61289
It is probably best that you write the code in the most readable and understandable way you can and then profile it to see where the bottlenecks really are. Often times your conception of where you need efficiency doesn't match up with reality.
Modern compilers do a decent job with many aspects of optimization and it seems unlikely that the process of looping will itself be a problem. Perhaps you should consider focusing on simplifying the calculation done by each loop.
Otherwise, you'll be looking at things such as accessing your matrix row by row so that you take advantage of the row-major storage order C uses (see this question).
You'll want to build your for
loops without if
statements inside because if statements create what is called "branching". The computer essentially guesses which option will be right and pays a sometimes hefty option if it is wrong.
To extend that theme, you want to do as little inside the for loop as possible. You'll also want to define it with static limits, e.g.:
for(int i=1;i<100;i++) //This is better than
for(int i=1;i<N/i;i++) //this
Static limits means that very little effort is expended determining if the for loop should keep going. They also permit you to use OpenMP to divy up the work in the loops, which can sometimes speed things up considerably. This is simple to do:
#pragma omp parallel for
for(int i=0;i<100;i++)
And, walla! the code is parallelized.
Upvotes: 2
Reputation: 11893
Well written code tends to be efficient (though not always optimal). Start by writing good clean code, and if you actually have a performance problem that can be isolated and addressed.
Upvotes: 5