Unsigned
Unsigned

Reputation: 9916

Short-circuiting on boolean operands without side effects

For the bounty: How can this behavior can be disabled on a case-by-case basis without disabling or lowering the optimization level?

The following conditional expression was compiled on MinGW GCC 3.4.5, where a is a of type signed long, and m is of type unsigned long.

if (!a && m > 0x002 && m < 0x111)

The CFLAGS used were -g -O2. Here is the corresponding assembly GCC output (dumped with objdump)

120:    8b 5d d0                mov    ebx,DWORD PTR [ebp-0x30]
123:    85 db                   test   ebx,ebx
125:    0f 94 c0                sete   al
128:    31 d2                   xor    edx,edx
12a:    83 7d d4 02             cmp    DWORD PTR [ebp-0x2c],0x2
12e:    0f 97 c2                seta   dl
131:    85 c2                   test   edx,eax
133:    0f 84 1e 01 00 00       je     257 <_MyFunction+0x227>
139:    81 7d d4 10 01 00 00    cmp    DWORD PTR [ebp-0x2c],0x110
140:    0f 87 11 01 00 00       ja     257 <_MyFunction+0x227>

120-131 can easily be traced as first evaluating !a, followed by the evaluation of m > 0x002. The first jump conditional does not occur until 133. By this time, two expressions have been evaluated, regardless of the outcome of the first expression: !a. If a was equal to zero, the expression can (and should) be concluded immediately, which is not done here.

How does this relate to the the C standard, which requires Boolean operators to short-circuit as soon as the outcome can be determined?

Upvotes: 7

Views: 1023

Answers (5)

Keith Thompson
Keith Thompson

Reputation: 263237

The code is behaving correctly (i.e., in accordance with the requirements of the language standard) either way.

It appears that you're trying to find a way to generate specific assembly code. Of two possible assembly code sequences, both of which behave the same way, you find one satisfactory and the other unsatisfactory.

The only really reliable way to guarantee the satisfactory assembly code sequence is to write the assembly code explicitly. gcc does support inline assembly.

C code specifies behavior. Assembly code specifies machine code.

But all this raises the question: why does it matter to you? (I'm not saying it shouldn't, I just don't understand why it should.)

EDIT: How exactly are a and m defined? If, as you suggest, they're related to memory-mapped devices, then they should be declared volatile -- and that might be exactly the solution to your problem. If they're just ordinary variables, then the compiler can do whatever it likes with them (as long as it doesn't affect the program's visible behavior) because you didn't ask it not to.

Upvotes: 3

Seth
Seth

Reputation: 2667

As others have mentioned, this assembly output is a compiler optimization that doesn't affect program execution (as far as the compiler can tell). If you want to selectively disable this optimization, you need to tell the compiler that your variables should not be optimized across the sequence points in the code.

Sequence points are control expressions (the evaluations in if, switch, while, do and all three sections of for), logical ORs and ANDs, conditionals (?:), commas and the return statement.

To prevent compiler optimization across these points, you must declare your variable volatile. In your example, you can specify

volatile long a;
unsigned long m;
{...}
if (!a && m > 0x002 && m < 0x111) {...}

The reason that this works is that volatile is used to instruct the compiler that it can't predict the behavior of an equivalent machine with respect to the variable. Therefore, it must strictly obey the sequence points in your code.

Upvotes: 5

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215201

The C standard only specifies the behavior of an "abstract machine"; it does not specify the generation of assembly. As long as the observable behavior of a program matches that on the abstract machine, the implementation can use whatever physical mechanism it likes for implementing the language constructs. The relevant section in the standard (C99) is 5.1.2.3 Program execution.

Upvotes: 11

Jon Bright
Jon Bright

Reputation: 13738

The compiler's optimising - it gets the result into EBX, moves it to AL, part of EAX, does the second check into EDX, then branches based on the comparison of EAX and EDX. This saves a branch and leaves the code running faster, without making any difference at all in terms of side effects.

If you compile with -O0 rather than -O2, I imagine it will produce more naive assembly that more closely matches your expectations.

Upvotes: 4

David Brown
David Brown

Reputation: 13526

It is probably a compiler optimization since comparing integral types has no side effects. You could try compiling without optimizations or using a function that has side effects instead of the comparison operator and see if it still does this.

For example, try

if (printf("a") || printf("b")) {
    printf("c\n");
}

and it should print ac

Upvotes: 6

Related Questions