Reputation: 8640

In C++, which is faster? (2 * i + 1) or (i << 1 | 1)?

I realize that the answer is probably hardware specific, but I'm curious if there was a more general intuition that I'm missing?

I asked this question & given the answer, now I'm wondering if I should alter my approach in general to use "(i << 1|1)" instead of "(2*i + 1)"??

Upvotes: 8

Answers (8)

Gunther Piez

Reputation: 30449

I just tested this with gcc-4.7.1 using the source of FrankH, the code generated is

lea    0x1(%rdi,%rdi,1),%eax
retq

no matter if the shift or the multiplication version is used.

Upvotes: 1

Abyx

Reputation: 12928

i + i + 1 may be faster than other two, because addition is faster than multiplication and can be faster than shift.

Upvotes: 0

FrankH.

Reputation: 18247

Just an experiment regarding answers given about "... it'll use LEA":
The following code:

int main(int argc, char **argv)
{
#ifdef USE_SHIFTOR
return (argc << 1 | 1);
#else
return (2 * argc + 1);
#endif
}

will, with gcc -fomit-frame-pointer -O8 -m{32|64} (for 32bit or 64bit) compile into the following assembly code:

x86, 32bit:

080483a0 <main>:
80483a0:    8b 44 24 04             mov    0x4(%esp),%eax
80483a4:    8d 44 00 01             lea    0x1(%eax,%eax,1),%eax
80483a8:    c3                      ret

x86, 64bit:

00000000004004c0 <main>:
4004c0: 8d 44 3f 01             lea    0x1(%rdi,%rdi,1),%eax
4004c4: c3                      retq

x86, 64bit, -DUSE_SHIFTOR:

080483a0 <main>:
80483a0:    8b 44 24 04             mov    0x4(%esp),%eax
80483a4:    01 c0                   add    %eax,%eax
80483a6:    83 c8 01                or     $0x1,%eax
80483a9:    c3                      ret

x86, 32bit, -DUSE_SHIFTOR:

00000000004004c0 <main>:
4004c0: 8d 04 3f                lea    (%rdi,%rdi,1),%eax
4004c3: 83 c8 01                or     $0x1,%eax
4004c6: c3                      retq

In fact, it's true that most cases will use LEA. Yet the code is not the same for the two cases. There are two reasons for that:

addition can overflow and wrap around, while bit operations like << or | cannot
(x + 1) == (x | 1) only is true if !(x & 1) else the addition carries over to the next bit. In general, adding one only results in having the lowest bit set in half of the cases.

While we (and the compiler, probably) know that the second is necessarily applicable, the first is still a possibility. The compiler therefore creates different code, since the "or-version" requires forcing bit zero to 1.

Upvotes: 8

BlackBear

Reputation: 22989

The faster is the first form (the one with the shift right), in fact the shr instruction takes 4 clock cycles to complete in the worst case, while the mul 10 in the best case. However, the best form should be decided by the compiler because it has a complete view of the others (assembly) instructions.

Upvotes: -2

Jonathan Sternberg

Reputation: 6677

Output of gcc with the -S option (no compiler flags given):

.LCFI3:
        movl    8(%ebp), %eax
        addl    %eax, %eax
        orl     $1, %eax
        popl    %ebp
        ret

.LCFI1:
        movl    8(%ebp), %eax
        addl    %eax, %eax
        addl    $1, %eax
        popl    %ebp
        ret

I'm not sure which one is which, but I don't believe it matters.

If the compiler does no optimizations at all, then the second would probably translate to faster assembly instructions. How long each instruction takes is completely architecture-dependent. Most compilers will optimize them to be the same assembly-level instructions.

Upvotes: 4

Stephen Hazel

Reputation: 888

Nobody cares. Nor should they.
Quit worrying about that and get your code correct, simple, and done.

Upvotes: 0

Crashworks

Reputation: 41482

Any but the most brain-dead compiler will see those expressions as equivalent and compile them to the same executable code.

Typically it's not really worth worrying too much about optimizing simple arithmetic expressions like these, since it's the sort of thing compilers are best at optimizing. (Unlike many other cases in which a "smart compiler" could do the the right thing, but an actual compiler falls flat.)

This will work out to the same pair of instructions on PPC, Sparc, and MIPS, by the way: a shift followed by an add. On the ARM it'll cook down to a single fused shift-add instruction, and on x86 it'll probably be a single LEA op.

Upvotes: 5

paxdiablo

Reputation: 882596

Since the ISO standard doesn't actually mandate performance requirements, this will depend on the implementation, the compiler flags chosen, the target CPU and quite possibly the phase of the moon.

These sort of optimisations (saving a couple of cycles) almost always pale into insignificance in terms of return on investment, against macro-level optimisations like algorithm selection.

Aim for readability of code first and foremost. If your intent is to shift bits and OR, use the bit-shift version. If your intent is to multiply, use the * version. Only worry about performance once you've established there's an issue.

Any decent compiler will optimise it far better than you can anyway :-)

Upvotes: 13

In C++, which is faster? (2 * i + 1) or (i &lt;&lt; 1 | 1)?

Answers (8)

Related Questions

In C++, which is faster? (2 * i + 1) or (i << 1 | 1)?