Reputation: 11
I have code that is doing a lot of these comparison operations. I was wondering which is the most efficient one to use. Is it likely that the compiler will correct it if I intentionally choose the "wrong" one?
int a, b;
// Assign a value to a and b.
// Now check whether either is zero.
// The worst?
if (a * b == 0) // ...
// The best?
if (a & b == 0) // ...
// The most obvious?
if (a == 0 || b == 0) // ...
Other ideas?
Upvotes: 0
Views: 200
Reputation: 320671
There's no "most efficient way to do it in C", if by "efficiency" you mean the efficiency of the compiled code.
Firstly, even if we assume that the compiler translates C language operator into their "obvious" machine counterparts (i.e. C multiplication into machine multiplication etc) the efficiency of each method will differ from one hardware platform to the other. Even if we restrict our consideration to a very specific sequence of instructions on a very specific hardware platform, it still can exhibit different performance in different surrounding contexts, depending, for example, on how well the whole thing agrees with the branch prediction heuristic in the given CPU.
Secondly, modern C compilers rarely translate C operators into their "obvious" machine counterparts. Often the instructions used in machine code will have very little in common with the C code. It is possible that many "completely different" methods of performing the check at C level will actually be translated into the same sequence of machine instructions by a smart compiler. At the same time the same C code might get translated into different sequences machine instructions when the surrounding contexts are different.
In other words, there's no meaningful answer to your question, unless you really really really localize it to a specific hardware platform, specific compiler version and specific set of compilation settings. And that will make it too localized to be useful.
That usually means that in general case the best way to do it is to write the most readable code. Just do
if (a == 0 || b == 0)
The readability of the code will not only help the human reader to understand it, but will also increase the probability of the compiler properly interpreting your intent and generating the most optimal code.
But if you really have to squeeze the last CPU cycle out of your performance-critical code, you have to try different versions and compare their relative efficiency manually.
Upvotes: 0
Reputation: 15269
It's possible to see which variant generates fewer assembly instructions, but it's a separate matter to see which one actually executes in less time.
To help you analyze the first matter, learn to use your C compiler's command-line flags to capture its intermediate output. GCC is a common choice for a C compiler. Let's look at its unoptimized assembly code for two different programs.
#include <stdio.h>
void report_either_zero()
{
int a = 1;
int b = 0;
if (a == 0 || b == 0)
{
puts("One of them is zero.");
}
}
Save that text to a file such as zero-test.c, and run the following command:
gcc -S zero-test.c
GCC will emit a file called zero-test.s, which is the assembly code it would normally submit to the assembler as it generates object code.
Let's look at the relevant fragment of the assembly code. I'm using gcc version 4.2.1 on Mac OS X generating x86 64-bit instructions.
_report_either_zero:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $32, %rsp
Ltmp2:
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $1, -20(%rbp) // a = 1
movl $0, -24(%rbp) // b = 0
movl -24(%rbp), %eax // Get ready to compare a.
cmpl $0, %eax // Does zero equal a?
je LBB1_2 // If so, go to label LBB1_2.
movl -24(%rbp), %eax // Otherwise, get ready to compare b.
cmpl $0, %eax // Does zero equal b?
jne LBB1_3 // If not, go to label LBB1_3.
LBB1_2:
leaq L_.str(%rip), %rax
movq %rax, %rdi
callq _puts // Otherwise, write the string to standard output.
LBB1_3:
addq $32, %rsp
popq %rbp
ret
Leh_func_end1:
You can see where the we load the integer values 1 and 0 into registers, then prepare to compare the first to zero, and then again with the second if the first is nonzero.
Now let's try a different approach with the comparison, to see how the assembly code changes. Note that this is not the same predicate; this one checks whether both numbers are zero.
#include <stdio.h>
void report_both_zero()
{
int a = 1;
int b = 0;
if (!(a | b))
{
puts("Both of them are zero.");
}
}
The assembly code is a little different:
_report_both_zero:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $16, %rsp
Ltmp2:
movl $1, -4(%rbp) // a = 1
movl $0, -8(%rbp) // b = 0
movl -4(%rbp), %eax // Get ready to operate on a.
movl -8(%rbp), %ecx // Get ready to operate on b too.
orl %ecx, %eax // Combine a and b via bitwise OR.
cmpl $0, %eax // Does zero equal the result?
jne LBB1_2 // If not, go to label LBB1_2.
leaq L_.str(%rip), %rax
movq %rax, %rdi
callq _puts // Otherwise, write the string to standard output.
LBB1_2:
addq $16, %rsp
popq %rbp
ret
Leh_func_end1:
If the first number is zero, the first variant does less work—in terms of the number of assembly instructions involved—by avoiding a second register move. If the first number is not zero, the second variant does less work by avoiding a second comparison to zero.
The question now is whether "move, move, bitwise or, compare" runs faster that "move, compare, move, compare." The answer could come down to things like whether the processor learns to predict how often the first integer is zero, and whether it is or not consistently.
If you ask the compiler to optimize this code, the example is too simple; the compiler decides at compile time that no comparison is necessary, and just condenses that code to an unconditional request to write the string. It's interesting to change the code to operate on parameters rather than constants, and see how the optimizer handles the situation differently.
Variant one:
#include <stdio.h>
void report_either_zero(int a, int b)
{
if (a == 0 || b == 0)
{
puts("One of them is zero.");
}
}
Variant two (again, a different predicate):
#include <stdio.h>
void report_both_zero(int a, int b)
{
if (!(a | b))
{
puts("Both of them are zero.");
}
}
Generate the optimized assembly code with this command:
gcc -O -S zero-test.c
Let us know what you find.
Upvotes: 1
Reputation: 154146
The most efficient is certainly the most obvious, if by efficiency, you are measuring the programmer's time.
If by measuring efficiency using the processor's time, profiling your candidate solution is the best way to answer - for the target machine you profiled.
But this exercise demonstrated a pitfall of programmer optimization. The 3 candidates are not functionally equivalent for all int
.
If you was a functional equivalent alternative...
I think the last candidate and a 4th one deserve comparison.
if ((a == 0) || (b == 0))
if ((a == 0) | (b == 0))
Due to the variation of compilers, optimization and CPU branch prediction, one should profile, rather than pontificate, to determine relative performance. OTOH, a good optimizing compiler may give you the same code for both.
I recommend the code that is easiest to maintain.
Upvotes: 0
Reputation: 14077
If you want to find whether or not one of two integers is zero using one comparison instruction ...
if ((a << b) == a)
If a is zero, then no amount of shifting it to the left will change its value.
If b is zero, then there is no shifting performed.
It is possible (I am too lazy to check) that there is some undefined behaviour should b be negative or really large.
However, due to the non-intuitiveness, it would be strongly recommended to implement this as a macro (with an appropriate comment).
Hope this helps.
Upvotes: 0
Reputation: 2987
This will not likely have much (if any, given modern compiler optimizers) effect on the overall performance of your app. If you really must know, you should write some code to test the performance of each for your compiler. However, as a best guess, I'd say...
if ( !( a && b ) )
This will short-circuit if the first happens to be 0.
Upvotes: 0
Reputation: 41513
In general, if there's a fast way of doing a simple thing, you can assume the compiler will do it that fast way. And remember that the compiler is outputting machine language, not C -- the fastest method probably can't be correctly represented as a set of C constructs.
Also, the third method there is the only one that always works. The first one fails if a and b are 1<<16, and the second you already know doesn't work.
Upvotes: 2