Kaia
Kaia

Reputation: 905

Does the C compiler have to short-circuit && and || unnecessarily?

C short-circuits && and ||.

Unlike the bitwise | operator, the || operator guarantees left-to-right evaluation; if the second operand is evaluated, there is a sequence point between the evaluations of the first and second operands. If the first operand compares unequal to 0, the second operand is not evaluated.

We often use this in idioms like

bool xyz = ptr && *ptr == 42;

But suppose we have very simple operands to the || operator:

int a[3] = {1, 2, 3};
[ ... other code ... ]

int myFunc() {
  int x = a[0] == 1 && a[1] == 2 && a[2] == 3;
}

It seems that if C must guarantee the latter operands are not evaluated, it has to generate something like (forgive my freehanding of poorly-remembered assembly pseudocode):

  xor   eax, eax
  cmp   a+0x0, 0x1 ; a[0] != 1
  jne   end
  cmp   a+0x4, 0x2 ; a[1] != 2
  jne   end
  cmp   a+0x8, 0x3 ; a[2] != 3
  jne   end
  mov   al, 0x1    ; all equal, set 1
  
.end:
  ret

That seems like a lot of branches! My general intuition is that unnecessary branches are bad, because we hit branch prediction issues. (In the past, I have replaced branches with statements like a && b, thinking it would remove the branch! Granted, I think it also makes the code clearer in most cases.)

Here's what I'd expect the non-branching version would look like, and my intuition says we'd prefer this in most cases. (again, forgive the freehand x86):

  xor   eax, eax
  xor   ecx, ecx
  cmp   a+0x0, 0x1 ; a[0] == 1
  sete  al
  cmp   a+0x4, 0x2 ; a[1] == 2
  sete  cl
  and   al, cl
  cmp   a+0x8, 0x3 ; a[2] == 3
  sete  cl
  and   al, cl
  ret

This feels like it is equivalent and removes the branches. (Though maybe loading a page into cache is "side effects" enough?)

I suspect I could encourage the compiler to generate this assembly by making my && into &? However, in general, my experience is that I should trust the compiler to optimize things better than I can.

Related:

Upvotes: 4

Views: 167

Answers (3)

gnasher729
gnasher729

Reputation: 52548

The second example is equivalent to

x = ((a[0] ^ 1) | (a[1] ^ 2) | (a[2] ^ 3)) == 0;

because the array indices are known to be valid and the array elements are initialised. It would be valid for uninitialised array elements if uninitialised elements on a particular implementation behave as if they had an unspecified value, not undefined behaviour.

So evaluation without short circuiting would be possible and would be done by a good compiler if it was faster.

Upvotes: 0

Lundin
Lundin

Reputation: 214040

This feels like it is equivalent and removes the branches.

It is not. You execute the right operands of the && regardless of what the previous results were. None of the mainstream x86 compilers generates code like that because they respect the "short circuit" behavior.

It's always a touchy subject when the compiler's optimizer changes the meaning of the code, compilers avoid that. Rather they seek to omit things from the C code that fill no purpose and that's also how the C standard itself defines optimization too (from C17 5.1.2.3):

An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced

In your specific case the compiler could omit the right operands of && if it can conclude that no side effects are present there. However, it cannot know at compile time which operands that are zero and non-zero, so it will have to execute the code for all operands of the && until it hits a zero. And then it must stop, as required by the && operator (C17 6.5.13):

If the first operand compares equal to 0, the second operand is not evaluated.

Running your first example on latest gcc x86 -O3 (or equivalent clang), I therefore get similar code to your first asm example:

myFunc:
        xor     eax, eax
        cmp     DWORD PTR [rdi], 1
        je      .L6
.L1:
        ret
.L6:
        cmp     DWORD PTR [rdi+4], 2
        jne     .L1
        xor     eax, eax
        cmp     DWORD PTR [rdi+8], 3
        sete    al
        ret

Changing the C code to int x = 0 && a[0] == 1 && a[1] == 2 && a[2] == 3; gives the expected optimization to just return zero no matter what:

myFunc:
        xor     eax, eax
        ret

If modifying that expression to contain a side-effect somewhere in the right operands, then it is still not evaluated.


  • Is the compiler required to generate "short-circuit" machine code if there are "no" side effects?

It is not allowed to change the meaning of the code regardless of side effects. It cannot execute code that was otherwise left out. It is also not allowed to re-order the evaluations because the && operator comes with a sequence point.

  • Is there a way to let the compiler decide whether to use an early-out or a non-branching strategy?
  • In practice, should this affect how I use & and && in parts of my code?

Well... maybe. && is well-known to cause extra branches and using & is a known way to manually optimize code to reduce branches. Whether or not we should manually optimize code to begin with is the big question. The rule of thumb is that we shouldn't do so unless we have spotted a real bottleneck in the program.

But lets give swapping && for & a try (and note precedence == over &):

int myFunc(int a[]) 
{
  int x = a[0] == 1 & a[1] == 2 & a[2] == 3;
  return x;
}

Does this asm look familiar? (gcc 13.3 x86 -O3)

myFunc:
        cmp     DWORD PTR [rdi], 1
        sete    dl
        cmp     DWORD PTR [rdi+4], 2
        sete    al
        and     edx, eax
        xor     eax, eax
        cmp     DWORD PTR [rdi+8], 3
        sete    al
        and     eax, edx
        ret

It did the very optimization you suggested. This is much more efficient code.

clang did something even fancier without any cmp at all:

myFunc:                                 # @myFunc
        mov     eax, dword ptr [rdi]
        mov     ecx, dword ptr [rdi + 4]
        xor     eax, 1
        xor     ecx, 2
        or      ecx, eax
        mov     edx, dword ptr [rdi + 8]
        xor     edx, 3
        xor     eax, eax
        or      edx, ecx
        sete    al
        ret

Upvotes: 1

Chris Dodd
Chris Dodd

Reputation: 126253

In practice, should this affect how I use & and && in parts of my code?

No it should not. You should write you code in the clearest and most humanly readable form. Microoptimizations like this should only be undertaken if and when profiling of the code has revealed a performance issue.

Is the compiler required to generate "short-circuit" machine code if there are "no" side effects?

The compiler is required to generate code that has the same visible effects as short-circuit code in the source. As long as it has enough information to know that is the case, it can generate any code it wants.


Some current machines will treat short forward conditional branches like this as essentially "conditional skip" instructions -- that is, they'll execute straight through the whole block of code without executing ANY branches (or consuming branch prediction resources, which are precious) and just supress the instructions that should be skipped by the branches. To take advantage of this, you might see code generated where each 'jne' branches to the next 'jne' instruction -- so it would appear to be quite suboptimal, but when executed turns out to be very fast.

Upvotes: 7

Related Questions