Reputation: 8640
I realize that the answer is probably hardware specific, but I'm curious if there was a more general intuition that I'm missing?
I asked this question & given the answer, now I'm wondering if I should alter my approach in general to use "(i << 1|1)" instead of "(2*i + 1)"??
Upvotes: 8
Views: 979
Reputation: 30449
I just tested this with gcc-4.7.1 using the source of FrankH, the code generated is
lea 0x1(%rdi,%rdi,1),%eax
retq
no matter if the shift or the multiplication version is used.
Upvotes: 1
Reputation: 12928
i + i + 1
may be faster than other two,
because addition is faster than multiplication and can be faster than shift.
Upvotes: 0
Reputation: 18247
Just an experiment regarding answers given about "... it'll use LEA
":
The following code:
int main(int argc, char **argv)
{
#ifdef USE_SHIFTOR
return (argc << 1 | 1);
#else
return (2 * argc + 1);
#endif
}
will, with gcc -fomit-frame-pointer -O8 -m{32|64}
(for 32bit or 64bit) compile into the following assembly code:
080483a0 <main>: 80483a0: 8b 44 24 04 mov 0x4(%esp),%eax 80483a4: 8d 44 00 01 lea 0x1(%eax,%eax,1),%eax 80483a8: c3 ret
00000000004004c0 <main>: 4004c0: 8d 44 3f 01 lea 0x1(%rdi,%rdi,1),%eax 4004c4: c3 retq
-DUSE_SHIFTOR
:080483a0 <main>: 80483a0: 8b 44 24 04 mov 0x4(%esp),%eax 80483a4: 01 c0 add %eax,%eax 80483a6: 83 c8 01 or $0x1,%eax 80483a9: c3 ret
-DUSE_SHIFTOR
:00000000004004c0 <main>: 4004c0: 8d 04 3f lea (%rdi,%rdi,1),%eax 4004c3: 83 c8 01 or $0x1,%eax 4004c6: c3 retq
In fact, it's true that most cases will use LEA
. Yet the code is not the same for the two cases. There are two reasons for that:
<<
or |
cannot(x + 1) == (x | 1)
only is true if !(x & 1)
else the addition carries over to the next bit. In general, adding one only results in having the lowest bit set in half of the cases.While we (and the compiler, probably) know that the second is necessarily applicable, the first is still a possibility. The compiler therefore creates different code, since the "or-version" requires forcing bit zero to 1.
Upvotes: 8
Reputation: 22989
The faster is the first form (the one with the shift right), in fact the shr instruction takes 4 clock cycles to complete in the worst case, while the mul 10 in the best case. However, the best form should be decided by the compiler because it has a complete view of the others (assembly) instructions.
Upvotes: -2
Reputation: 6677
Output of gcc with the -S option (no compiler flags given):
.LCFI3:
movl 8(%ebp), %eax
addl %eax, %eax
orl $1, %eax
popl %ebp
ret
.LCFI1:
movl 8(%ebp), %eax
addl %eax, %eax
addl $1, %eax
popl %ebp
ret
I'm not sure which one is which, but I don't believe it matters.
If the compiler does no optimizations at all, then the second would probably translate to faster assembly instructions. How long each instruction takes is completely architecture-dependent. Most compilers will optimize them to be the same assembly-level instructions.
Upvotes: 4
Reputation: 888
Nobody cares. Nor should they.
Quit worrying about that and get your code correct, simple, and done.
Upvotes: 0
Reputation: 41482
Any but the most brain-dead compiler will see those expressions as equivalent and compile them to the same executable code.
Typically it's not really worth worrying too much about optimizing simple arithmetic expressions like these, since it's the sort of thing compilers are best at optimizing. (Unlike many other cases in which a "smart compiler" could do the the right thing, but an actual compiler falls flat.)
This will work out to the same pair of instructions on PPC, Sparc, and MIPS, by the way: a shift followed by an add. On the ARM it'll cook down to a single fused shift-add instruction, and on x86 it'll probably be a single LEA
op.
Upvotes: 5
Reputation: 882596
Since the ISO standard doesn't actually mandate performance requirements, this will depend on the implementation, the compiler flags chosen, the target CPU and quite possibly the phase of the moon.
These sort of optimisations (saving a couple of cycles) almost always pale into insignificance in terms of return on investment, against macro-level optimisations like algorithm selection.
Aim for readability of code first and foremost. If your intent is to shift bits and OR
, use the bit-shift version. If your intent is to multiply, use the *
version. Only worry about performance once you've established there's an issue.
Any decent compiler will optimise it far better than you can anyway :-)
Upvotes: 13