savx2
savx2

Reputation: 1061

Compiler predicate optimizations

Consider the following example conditions/predicates:

  1. x > 10 and x > 20
  2. (x > 10 or x == 10) and (x < 10 or x == 10) aka x >= 10 and x <= 10

Predicate 1. can be simplified to x > 20 and 2. can be simplified to x == 10. Would a compiler optimize this kind of (or more complex) predicates and if so what algorithms are used to do so?

What are some common optimization techniques for predicates?

Upvotes: 4

Views: 308

Answers (2)

Wilfred Hughes
Wilfred Hughes

Reputation: 31151

It depends on the compiler, but clang and gcc do perform this optimisation:

#include <stdio.h>

void foo(int x) {
  if (x > 10 && x > 20)
    puts("foo");
}

void foo2(int x) {
  if ((x > 10 || x == 10) && (x < 10 || x == 10))
    puts("foo2");
}

You can see the assembly here -- both functions contain a single comparison.

For clang (which uses LLVM), it uses the instruction combine pass ('instcombine'). You can see of the transformations in the InstructionSimplify.cpp source code.

Upvotes: 2

dnickless
dnickless

Reputation: 10918

Looking at the IL code that the C# compiler spits out for the following method, at least in this case the compiler does not seem smart enough. Not sure, though, what happens when the IL code gets translated into native code or even later in the processor pipeline - there will be further optimizations kicking in:

private static bool Compare(int x)
{
   return (x > 10 || x == 10) && (x < 10 || x == 10);
}

Corresponding IL:

IL_0000: ldarg.0      // x
IL_0001: ldc.i4.s     10 // 0x0a
IL_0003: bgt.s        IL_000a
IL_0005: ldarg.0      // x
IL_0006: ldc.i4.s     10 // 0x0a
IL_0008: bne.un.s     IL_0017
IL_000a: ldarg.0      // x
IL_000b: ldc.i4.s     10 // 0x0a
IL_000d: blt.s        IL_0015
IL_000f: ldarg.0      // x
IL_0010: ldc.i4.s     10 // 0x0a
IL_0012: ceq          
IL_0014: ret          
IL_0015: ldc.i4.1     
IL_0016: ret          
IL_0017: ldc.i4.0     
IL_0018: ret

Here's the second (optimized) version:

private static bool Compare(int x)
{
   return x >= 10 && x <= 10;
}

And, again, the corresponding IL code:

IL_0000: ldarg.0      // x
IL_0001: ldc.i4.s     10 // 0x0a
IL_0003: blt.s        IL_000e
IL_0005: ldarg.0      // x
IL_0006: ldc.i4.s     10 // 0x0a
IL_0008: cgt          
IL_000a: ldc.i4.0     
IL_000b: ceq          
IL_000d: ret          
IL_000e: ldc.i4.0     
IL_000f: ret          

Since the second version is clearly shorter it has greater chances of getting inlined at runtime so we should expect it to run a bit faster.

Finally, the third one, let's call it "the best" (x == 10):

private static bool Compare(int x)
{
    return x == 10;
}

And its IL:

IL_0000: ldarg.0      // x
IL_0001: ldc.i4.s     10 // 0x0a
IL_0003: ceq          
IL_0005: ret          

Nice and concise.

Running a benchmark using Benchmark.NET and [MethodImpl(MethodImplOptions.NoInlining)] reveals the runtime behaviour which seems still substantially different for the two implementations:

Case 1: test candidates that are not 10 (negative case).

     Method |       Jit | Platform |     Mean 
----------- |---------- |--------- |----------
   TestBest | LegacyJit |      X64 | 2.329 ms
    TestOpt | LegacyJit |      X64 | 2.704 ms
 TestNonOpt | LegacyJit |      X64 | 3.324 ms
   TestBest | LegacyJit |      X86 | 1.956 ms
    TestOpt | LegacyJit |      X86 | 2.178 ms
 TestNonOpt | LegacyJit |      X86 | 2.796 ms
   TestBest |    RyuJit |      X64 | 2.480 ms
    TestOpt |    RyuJit |      X64 | 2.489 ms
 TestNonOpt |    RyuJit |      X64 | 3.101 ms
   TestBest |    RyuJit |      X86 | 1.865 ms
    TestOpt |    RyuJit |      X86 | 2.170 ms
 TestNonOpt |    RyuJit |      X86 | 2.853 ms

Case 2: test using 10 (positive case).

     Method |       Jit | Platform |     Mean
----------- |---------- |--------- |---------
   TestBest | LegacyJit |      X64 | 2.396 ms
    TestOpt | LegacyJit |      X64 | 2.780 ms
 TestNonOpt | LegacyJit |      X64 | 3.370 ms
   TestBest | LegacyJit |      X86 | 2.044 ms
    TestOpt | LegacyJit |      X86 | 2.199 ms
 TestNonOpt | LegacyJit |      X86 | 2.533 ms
   TestBest |    RyuJit |      X64 | 2.470 ms
    TestOpt |    RyuJit |      X64 | 2.532 ms
 TestNonOpt |    RyuJit |      X64 | 2.552 ms
   TestBest |    RyuJit |      X86 | 1.911 ms
    TestOpt |    RyuJit |      X86 | 2.210 ms
 TestNonOpt |    RyuJit |      X86 | 2.753 ms

Interesting to see is that in both cases, the new JIT runs in about the same time for the opt and non-opt X64 version.

The question still is: Why does the compiler not optimize these kinds of patterns? My guess would be that it's because of stuff like operator overloading which makes it impossible for the compiler to infer some correct logical conclusions but II might be extremely off... Also, for the built-in value types it should be possible. Oh well...

Lastly, here's a good articel on optimizations for boolean expressions: https://hbfs.wordpress.com/2008/08/26/optimizing-boolean-expressions-for-speed/

Upvotes: 1

Related Questions