Reputation: 59607
There's a comment in the zlib compression library (which is used in the Chromium project among many others) which implies that a do-while loop in C generates "better" code on most compilers. Here is the snippet of code where it appears.
do {
} while (*(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
*(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
*(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
*(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
scan < strend);
/* The funny "do {}" generates better code on most compilers */
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/zlib/deflate.c&l=1225
Is there any evidence that most (or any) compilers would generate better (e.g. more efficient) code?
Update: Mark Adler, one of the original authors, gave a bit of context in the comments.
Upvotes: 90
Views: 7167
Reputation: 311050
A while
loop is often compiled as a do-while
loop with an initial branch to the condition, i.e.
bra $1 ; unconditional branch to the condition
$2:
; loop body
$1:
tst <condition> ; the condition
brt $2 ; branch if condition true
whereas the compilation of a do-while
loop is the same without the initial branch. You can see from that that while()
is inherently less efficient by the cost of the initial branch, which is however only paid once. [Compare to the naive way of implementing while,
which requires both a conditional branch and an unconditional branch per iteration.]
Having said that, they aren't really comparable alternatives. It is painful to transform a while
loop into a do-while
loop and vice versa. They do different things. And in this case the several method calls would totally dominate whatever the compiler did with while
as against do-while.
Upvotes: 10
Reputation: 19736
In a nutshell (tl;dr):
I'm interpreting the comment in OPs' code a little differently, I think the "better code" they claim to have observed was due to moving the actual work into the loop "condition". I completely agree however that it's very compiler specific and that the comparison they made, while being able to produce a slightly different code, is mostly pointless and probably obsolete, as I show below.
Details:
It's hard to say what the original author meant by his comment about this do {} while
producing better code, but i'd like to speculate in another direction than what was raised here - we believe that the difference between do {} while
and while {}
loops is pretty slim (one less branch as Mystical said), but there's something even "funnier" in this code and that's putting all the work inside this crazy condition, and keeping the internal part empty (do {}
).
I've tried the following code on gcc 4.8.1 (-O3), and it gives an interesting difference -
#include "stdio.h"
int main (){
char buf[10];
char *str = "hello";
char *src = str, *dst = buf;
char res;
do { // loop 1
res = (*dst++ = *src++);
} while (res);
printf ("%s\n", buf);
src = str;
dst = buf;
do { // loop 2
} while (*dst++ = *src++);
printf ("%s\n", buf);
return 0;
}
After compiling -
00000000004003f0 <main>:
...
; loop 1
400400: 48 89 ce mov %rcx,%rsi
400403: 48 83 c0 01 add $0x1,%rax
400407: 0f b6 50 ff movzbl 0xffffffffffffffff(%rax),%edx
40040b: 48 8d 4e 01 lea 0x1(%rsi),%rcx
40040f: 84 d2 test %dl,%dl
400411: 88 16 mov %dl,(%rsi)
400413: 75 eb jne 400400 <main+0x10>
...
;loop 2
400430: 48 83 c0 01 add $0x1,%rax
400434: 0f b6 48 ff movzbl 0xffffffffffffffff(%rax),%ecx
400438: 48 83 c2 01 add $0x1,%rdx
40043c: 84 c9 test %cl,%cl
40043e: 88 4a ff mov %cl,0xffffffffffffffff(%rdx)
400441: 75 ed jne 400430 <main+0x40>
...
So the first loop does 7 instructions while the second does 6, even though they're supposed to do the same work. Now, I can't really tell if there's some compiler smartness behind this, probably not and it's just coincidental but I haven't checked how it interacts with other compiler options this project might be using.
On clang 3.3 (-O3) on the other hand, both loops generate this 5 instructions code :
400520: 8a 88 a0 06 40 00 mov 0x4006a0(%rax),%cl
400526: 88 4c 04 10 mov %cl,0x10(%rsp,%rax,1)
40052a: 48 ff c0 inc %rax
40052d: 48 83 f8 05 cmp $0x5,%rax
400531: 75 ed jne 400520 <main+0x20>
Which just goes to show that compilers are quite different, and advancing at a far faster rate than some programmer may have anticipated several years ago. It also means that this comment is pretty meaningless and probably there because no one had ever checked if it still makes sense.
Bottom line - if you want to optimize to the best possible code (and you know how it should look like), do it directly in assembly and cut the "middle-man" (compiler) from the equation, but take into account that newer compilers and newer HW might make this optimization obsolete. In most cases it's far better to just let the compiler do that level of work for you, and focus on optimizing the big stuff.
Another point that should be made - instruction count (assuming this is what the original OPs' code was after), is by no means a good measurement for code efficiency. Not all instructions were created equal, and some of them (simple reg-to-reg moves for e.g.) are really cheap as they get optimized by the CPU. Other optimization might actually hurt CPU internal optimizations, so eventually only proper benchmarking counts.
Upvotes: 16
Reputation: 471559
First of all:
A do-while
loop is not the same as a while
-loop or a for
-loop.
while
and for
loops may not run the loop body at all.do-while
loop always runs the loop body at least once - it skips the initial condition check.So that's the logical difference. That said, not everyone strictly adheres to this. It is quite common for while
or for
loops to be used even when it is guaranteed that it will always loop at least once. (Especially in languages with foreach loops.)
So to avoid comparing apples and oranges, I'll proceed assuming that the loop will always run at least once. Furthermore, I won't mention for
loops again since they are essentially while
loops with a bit of syntax sugar for a loop counter.
So I'll be answering the question:
If a while
loop is guaranteed to loop at least once, is there any performance gain from using a do-while
loop instead.
A do-while
skips the first condition check. So there is one less branch and one less condition to evaluate.
If the condition is expensive to check, and you know you're guaranteed to loop at least once, then a do-while
loop could be faster.
And while this is considered a micro-optimization at best, it is one that the compiler can't always do: Specifically when the compiler is unable to prove that the loop will always enter at least once.
In other words, a while-loop:
while (condition){
body
}
Is effectively the same as this:
if (condition){
do{
body
}while (condition);
}
If you know that you will always loop at least once, that if-statement is extraneous.
Likewise at the assembly level, this is roughly how the different loops compile to:
do-while loop:
start:
body
test
conditional jump to start
while-loop:
test
conditional jump to end
start:
body
test
conditional jump to start
end:
Note that the condition has been duplicated. An alternate approach is:
unconditional jump to end
start:
body
end:
test
conditional jump to start
... which trades away the duplicate code for an additional jump.
Either way, it's still worse than a normal do-while
loop.
That said, compilers can do what they want. And if they can prove that the loop always enters once, then it has done the work for you.
But things are bit weird for the particular example in the question because it has an empty loop body. Since there is no body, there's no logical difference between while
and do-while
.
FWIW, I tested this in Visual Studio 2012:
With the empty body, it does actually generate the same code for while
and do-while
. So that part is likely a remnant of the old days when compilers weren't as great.
But with a non-empty body, VS2012 manages to avoid duplication of the condition code, but still generates an extra conditional jump.
So it's ironic that while the example in the question highlights why a do-while
loop could be faster in the general case, the example itself doesn't seem to give any benefit on a modern compiler.
Considering how old the comment was, we can only guess at why it would matter. It's very possible that the compilers at the time weren't capable of recognizing that the body was empty. (Or if they did, they didn't use the information.)
Upvotes: 111
Reputation:
This discussion of while vs. do efficiency is completely pointless in this case, as there is no body.
while (Condition)
{
}
and
do
{
}
while (Condition);
are absolutely equivalent.
Upvotes: 0
Reputation:
The remark is not about the choice of the control statement (do vs. while), it is about the loop unrolling !!!
As you can see, this is a string comparison function (string elements probably being 2 bytes long), which could have been written with a single comparison rather than four in a shortcut-and expression.
This latter implementation is for sure faster, as it does a single check of the end-of-string condition after every four element comparisons, whereas standard coding would involve one check per comparison. Said differently, 5 tests per 4 element vs. 8 tests per 4 element.
Anyway, it will work only if the string length is a multiple of four or has a sentinel element (so that the two strings are guaranteed to differ past the strend
border). Pretty risky !
Upvotes: 8
Reputation:
Is there any evidence that most (or any) compilers would generate better (e.g. more efficient) code?
Not much, unless you look at the actual generated assembly of an actual, specific compiler on a specific platform with some specific optimization settings.
This was probably worth worrying about decades ago (when ZLib has been written), but certainly not nowadays, unless you found, by real profiling, that this removes a bottleneck from your code.
Upvotes: 25