Reputation: 34621
How do you tell the compiler to unroll loops based on the number of iterations or some other attribute? Or, how do you turn on loop unrolling optimization in Visual Studio 2005?
EDIT: E.g.
//Code Snippet 1
vector<int> b;
for(int i=0;i<3;++i) b.push_back(i);
As opposed to
//Code Snippet 2
vector<int> b;
b.push_back(0);
b.push_back(1);
b.push_back(2);
push_back() is an example, I could replace this with anything which can take a long time.
But I read somewhere that I can use Code 1 and the compiler can unroll it to Code 2 if the loop satisfies some criteria. So my question is: how do you do that? There's already a discussion on SO as to which one is more efficient but any comments on that is appreciated anyway.
Upvotes: 1
Views: 4917
Reputation: 44881
Right click on the project, select properties and navigate: alt text http://img200.imageshack.us/img200/8685/propsm.jpg
WRT loop unrolling, note that it's generally accepted that with MS Visual Studio optimizing for size rather than speed actually produces faster code due to cache hits/misses.
Upvotes: 3
Reputation: 340426
Note that you say:
push_back() is an example, I could replace this with anything which can take a long time.
In fact, if push_back() (or whatever you replace it with) takes a long time, that's a situation where loop unrolling would be a waste of effort. Looping generally isn't particularly slow; the times where loop unrolling makes sense is where the work done inside the loop is very small - in that case the looping constructs might start to dominate the processing of that stretch of execution.
As I'm sure you'll get in many other answers - don't worry about this type of thing unless you actually find that it's a bottleneck. 99% of the time, it won't be.
Upvotes: 5
Reputation: 224159
Loop unrolling will not magically make the code executed in the loop run faster. All it does is to save a few CPU cycles used for comparing the loop variable. So it only makes sense in very tight loops where the loop body itself does next to nothing.
Regarding your example: While push_back()
takes amortized constant time, this does include the occasional allocate-copy-deallocate cycle plus the copying of the actual objects. I very much doubt that the comparisons in the loop play a significant role compared to that. And if you replace it with anything else taking a long time, the same applies.
Of course, this could be wrong on any specific CPU and right on any other. With the idiosyncrasies of modern CPU architectures with their caches, instruction pipelines and branch prediction schemes it has become very hard to outsmart the compiler in optimizing code. That you would attempt to optimize a loop with a "heavy" body by unrolling it seems to be a hint that you don't know enough to achieve much in this. (I'm trying hard to say this so you won't be offended. I'm the first to admit that I'm a looser in this game myself.)
If you're having problems with performance, IME in 9 out of 10 cases eliminating silly errors (like copying complex objects) and optimizing algorithms and data structures is what you should look at.
(If you still believe your problem falls into the 1-out-of-10 category, then try Intel's compiler. The last time I looked at it you could download a test version for free, it plugged into VS, was very easy to setup, and brought about 0.5% of speed gain in the application I tested it in.)
Upvotes: 5
Reputation: 248199
It's generally fairly simple: "You enable optimizations".
If you tell the compiler to optimize your code, then loop unrolling is one of the many optimizations it tries to apply.
Keep in mind though, that unrolling is not always going to produce faster code. It might cause cache misses (in both data and instruction cache). And with the advanced branch prediction found in modern CPU's, the costs of the branches that make up a loop is often negligible.
Sometimes, the compiler may determine that unrolling would produce slower code, and then it won't do it.
Upvotes: 6
Reputation: 45533
Usually you just let the compiler to its job. If the number of loops is known at compile-time, and compiler optimizations are turned on, the compiler will balance code-size with branch reduction and unroll any unrollable loops.
If that's really not what you want, there's also the possibility of doing it yourself with Duff's Device: (from wikipedia)
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
This gives you unrolling with runtime determined iteration counts.
If it's still compile-time unrolling you want, and the built in optimizations aren't what you want (if you want finer-grained control), you could create a C++ template to do what you want. This is a pretty trivial template application, and since it is all done at compile time, you don't lose any function inlining or other optimizations that the compiler might do in addition.
Upvotes: 7