Reputation: 223

Performance of comparisons in C++ ( foo >= 0 vs. foo != 0 )

I've been working on a piece of code recently where performance is very important, and essentially I have the following situation:

int len = some_very_big_number;
int counter = some_rather_small_number;

for( int i = len; i >= 0; --i ){
    while( counter > 0 && costly other stuff here ){
        /* do stuff */
        --counter;
    }
    /* do more stuff */
}

So here I have a loop that runs very often and for a certain number of runs the while block will be executed as well until the variable counter is reduced to zero and then the while loop will not be called because the first expression will be false.

The question is now, if there is a difference in performance between using
counter > 0 and counter != 0?
I suspect there would be, does anyone know specifics about this.

Upvotes: 2

Answers (15)

Lolo

Reputation: 4159

I stumbled across this question just now, 3 years after it is asked, so I am not sure how useful the answer will still be... Still, I am surprised not to see clearly stated that answering your question requires to know two and only two things:

which processor you target
which compiler you work with

To the first point, each processor has different instructions for tests. On one given processor, two similar comparisons may turn up to take a different number of cycles. For example, you may have a 1-cycle instruction to do a gt (>), eq (==), or a le (<=), but no 1-cycle instruction for other comparisons like a ge (>=). Following a test, you may decide to execute conditional instructions, or, more often, as in your code example, take a jump. There again, cycles spent in jumps take a variable number of cycles on most high-end processors, depending whether the conditional jump is taken or not taken, predicted or not predicted. When you write code in assembly and your code is time critical, you can actually take quite a bit of time to figure out how to best arrange your code to minimize overall the cycle count and may end up in a solution that may have to be optimized based on the number of time a given comparison returns a true or false.

Which leads me to the second point: compilers, like human coders, try to arrange the code to take into account the instructions available and their latencies. Their job is harder because some assumptions an assembly code would know like "counter is small" is hard (not impossible) to know. For trivial cases like a loop counter, most modern compilers can at least recognize the counter will always be positive and that a != will be the same as a > and thus generate the best code accordingly. But that, as many mentioned in the posts, you will only know if you either run measurements, or inspect your assembly code and convince yourself this is the best you could do in assembly. And when you upgrade to a new compiler, you may then get a different answer.

Upvotes: 0

PinkTriangles

Reputation: 81

Clearly the solution is to use the correct data type.

Make counter an unsigned int. Then it can't be less than zero. Your compiler will obviously know this and be forced to choose the optimal solution.

Or you could just measure it.

You could also think about how it would be implemented...(here we go on a tangent)...

less than zero: the sign bit would be set, so need to check 1 bit
equal to zero : the whole value would be zero, so need to check all the bits

Of course, computers are funny things, and it may take longer to check a single bit than the whole value (however many bytes it is on your platform).

You could just measure it...

And you could find out that one it more optimal than another (under the conditions you measured it). But your program will still run like a dog because you spent all your time optimising the wrong part of your code.

The best solution is to use what many large software companies do - blame the hardware for not runnnig fast enough and encourage your customer to upgrade their equipment (which is clearly inferior since your product doesn't run fast enough).

< /rant>

Upvotes: 0

Mike Dunlavey

Reputation: 40679

Thinking that the type of comparison is going to make a difference, without knowing it, is the definition of guessing.

Don't guess.

Upvotes: 1

ShuggyCoUk

Reputation: 36448

I would add that the overwhelming performance aspects of this code on modern cpus will be dominated not by the comparison instruction but whether the comparison is well predicted since any mis-predict will waste many more cycles than any integral operation.

As such loop unrolling will probably be the biggest winner but measure, measure, measure.

Upvotes: 1

Lucas

Reputation: 14129

Assuming you are developing for the x86 architecture, when you look at the assembly output it will come down to jns vs jne. jns will check the sign flag and jne will check the zero flag. Both operations, should as far as I know, be equally costly.

Upvotes: 0

Maciek

Reputation: 19893

As Jim said, when in doubt see for yourself :

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
using namespace std;
void main()
{
    ptime Before = microsec_clock::universal_time(); // UTC NOW
    // do stuff here
    ptime After = microsec_clock::universal_time(); // UTC NOW
    time_duration delta_t = After - Before; // How much time has passed?
    cout << delta_t.total_seconds() << endl; // how much seconds total?
    cout << delta_t.fractional_seconds() << endl; // how much microseconds total?
}

Here's a pretty nifty way of measuring time. Hope that helps.

Upvotes: 3

Mark Ransom

Reputation: 308206

There will be a huge difference if the counter starts with a negative number. Otherwise, on every platform I'm familiar with, there won't be a difference.

Upvotes: 4

T.E.D.

Reputation: 44804

In programming, the following statement is the sign designating the road to Hell:

I've been working on a piece of code recently where performance is very important

Write your code in the cleanest, most easy to understand way. Period.

Once that is done, you can measure its runtime. If it takes too long, measure the bottlenecks, and speed up the biggest ones. Keep doing that until it is fast enough.

The list of projects that failed or suffered catastrophic loss due to a misguided emphasis on blind optimization is large and tragic. Don't join them.

Upvotes: 18

Alex

Reputation: 6933

OK, you can measure this, sure. However, these sorts of comparisons are so fast that you are probably going to see more variation based on processor swapping and scheduling then on this single line of code.

This smells of unnecessary, and premature, optimization. Right your program, optimize what you see. If you need more, profile, and then go from there.

Upvotes: 1

Khaled Alshaya

Reputation: 96879

Do you think that what will solve your problem! :D

    if(x >= 0)
00CA1011  cmp         dword ptr [esp],0 
00CA1015  jl          main+2Ch (0CA102Ch) <----
...
    if(x != 0)
00CA1026  cmp         dword ptr [esp],0 
00CA102A  je          main+3Bh (0CA103Bh) <----

Upvotes: 19

Martin Liversage

Reputation: 106836

Is there a difference between counter > 0 and counter != 0? It depends on the platform.

A very common type of CPU are those from Intel we have in our PC's. Both comparisons will map to a single instruction on that CPU and I assume they will execute at the same speed. However, to be certain you will have to perform your own benchmark.

Upvotes: 3

Cellfish

Reputation: 2192

I think you're spending time optimizing the wrong thing. "costly other stuff here", "do stuff" and "do more stuff" are more important to look at. That is where you'll make the big performance improvements I bet.

Upvotes: 8

David Thornley

Reputation: 57046

There may be no difference. You could try examining the assembly output for each.

That being said, the only way to tell if any difference is significant is to try it both ways and measure. I'd bet that the change makes no difference whatsoever with optimizations on.

Upvotes: 0

Matt J

Reputation: 45193

In general, they should be equivalent (both are usually implemented in single-cycle instructions/micro-ops). Your compiler may do some strange special-case optimization that is difficult to reason about from the source level, which may make either one slightly faster. Also, equality testing is more energy-efficient than inequality testing (>), though the system-level effect is so small as to not merit discussion.

Upvotes: 0

Jim Lewis

Reputation: 45075

To measure is to know.

Upvotes: 33

Performance of comparisons in C++ ( foo &gt;= 0 vs. foo != 0 )

Answers (15)

Related Questions

Performance of comparisons in C++ ( foo >= 0 vs. foo != 0 )