hans
hans

Reputation: 131

Helping the compiler to optimize branchy code sequences

I have code sequences in C/C++ that contain lots of branches, something like this:

if( condition1 )
    return true;
if( condition2 )
    return true;
...
return false;

(which is equivalent to return condition1 || condition2 || ...;)

Evaluating each of the conditions requires several memory accesses (all read-only) but the compiler misses an important optimization opportunity by not moving memory accesses before a previous condition is evaluated. The reason being that condition2's memory accesses may segfault when condition1 is true. Well I know that they don't and I would like the compiler to do the sensible thing and mix some of these code sequences where it is appropriate for performance, e.g. to exploit instruction-level-parallelism. I also don't want to change the condition to a logical or (not short-circuit) because one of the branches will likely jump out.

Any ideas on how this can be accomplished (preferably using gcc)?

Thanks.

Upvotes: 13

Views: 322

Answers (4)

Puppy
Puppy

Reputation: 146940

Don't bother. The CPU already has out-of-order execution, speculative execution and branch prediction. It's pretty unlikely that any difference at this level could make any difference whatsoever. Instruction-level parallelism is done implicitly by the CPU, not explicitly by the compiler. Perhaps GCC didn't do anything because there's nothing to be gained.

On that note, you'd have to have one hell of a condition to make a difference in the running time of a non-trivial application.

Oh, and logical or is Standard-guaranteed to be short circuit.

Upvotes: 1

Tony Delroy
Tony Delroy

Reputation: 106116

Evaluating each of the conditions requires several memory accesses

Why don't you avoid short-circuit evaluation within individual conditions, but let it happen for the or-ing of conditions?

Using operators that are not short-circuit for built-in types

Exactly how you achieve the former depends on the nature of those conditions (i.e. condition1, condition2 in your code) - given you show nothing about them I can only talk in generalities: where they internally contain short-circuit operators, instead convert the boolean value to an integer representation and use e.g. bitwise-OR (or even '+' or '*' if it reads better and works in your particular usage). Bitwise operators are generally safer as they're lower precedence though - only have to be careful if your conditions already include bitwise operators.

To illustrate:

OLD: return (a > 4 && b == 2 && c < a) ||   // condition1
            (a == 3 && b != 2 && c == -a);  // condition2

NEW: return (a > 4 & b == 2 & c < a) ||
            (a == 3 & b != 2 & c == -a);

Also be careful if you used implicit conversion of numbers/pointers to bool before... you want to normalise them to bools so their least-significant bits reflect their boolean significance:

OLD: return my_int && my_point && !my_double;
NEW: return bool(my_int) & bool(my_point) & !my_double;  // ! normalises before bitwise-&

You might also want to benchmark with...

     bool condition1 = a > 4 & b == 2 & c < a;
     bool condition2 = a == 3 & b != 2 & c == -a;
     return condition1 || condition2;

...which might be faster - possibly only in the overall "return false" case and perhaps when the last conditionN or two are the deciding factor in a "return true".

User-defined operators avoid short-circuit evaluation

Seperately, short-circuit evaluation is disabled for objects with overloaded logical operators, which provides another avenue for you to do your checks using the existing notation, but you'll have to change or enhance your data types.

Thoughts

More generally, you'll only benefit from this if you've a large number of assertions combined in each individual condition - more so if the function tends to run through to return false.

"AProgrammer" makes a great point too - with speculative execution available on modern CPUs, the CPU may already be ahead of the ordering implied by short-circuit evaluation (in some special mode than either avoids or suppresses any memory faults from dereferencing invalid pointers, divides by 0 etc). So, it's possible that the entire attempt at optimisation may prove pointless or even counterproductive. Benchmarking of all alternatives is required.

Upvotes: 6

jreitz
jreitz

Reputation: 21

Take a look at the __builtin_expect functions provided by gcc. When one defines likely / unlikely macros as is done with the linux kernel, these can be used intuitively with little impact on code readability.

Upvotes: 2

Goz
Goz

Reputation: 62323

Could you just move the parts of the condition yourself?

ie

 const bool bCondition1Result = <condition1>;
 const bool bCondition2Result = <condition2>;

and so on

For better optimisation still ... re-jig the order of your conditions so that the most hit one is the first to be checked. This way it will early out more often than not (This may make precious little difference).

Upvotes: 5

Related Questions