msc
msc

Reputation: 34568

Does the order of cases in a switch statement affect performance?

I have a switch case program:

Ascending order switch cases :

int main()
{
        int a, sc = 1;
        switch (sc)
        {
                case 1:
                        a = 1;
                        break;
                case 2:
                        a = 2;
                        break;
        }
}

Assembly of code:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, 1
        je      .L3
        cmp     eax, 2
        je      .L4
        jmp     .L2
.L3:
        mov     DWORD PTR [rbp-8], 1
        jmp     .L2
.L4:
        mov     DWORD PTR [rbp-8], 2
        nop
.L2:
        mov     eax, 0
        pop     rbp
        ret

Descending order switch cases:

int main()
{
        int a, sc = 1;
        switch (sc)
        {
                case 2:
                        a = 1;
                        break;
                case 1:
                        a = 2;
                        break;
        }
}

Assembly of code:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, 1
        je      .L3
        cmp     eax, 2
        jne     .L2
        mov     DWORD PTR [rbp-8], 1
        jmp     .L2
.L3:
        mov     DWORD PTR [rbp-8], 2
        nop
.L2:
        mov     eax, 0
        pop     rbp
        ret

Here, ascending order cases generated more assembly than descending order.

So, if I have more numbers of switch cases, then does the order of cases affect performance?

Upvotes: 40

Views: 15733

Answers (7)

user8880145
user8880145

Reputation: 53

You should probably enable the optimisations for your compiler before comparing assembly code, however the problem is that your variable is known at compile time so the compiler can remove everything from your function because it doesn't have any side effects.

This example shows that even if you change the order of the cases in a switch statement in your example, GCC and most other compilers will reorder them if the optimisations are enabled. I used extern functions to make sure the values are only known at run-time but I could also have used rand for example.

Also, when you add more cases, the compiler may replace the conditional jumps by a table which contains the addresses of the functions and it will still get reordered by GCC as can be seen here.

Upvotes: 4

supercat
supercat

Reputation: 81115

In cases where most case labels are consecutive, compilers will often process switch statements to use jump tables rather than comparisons. The exact means by which compilers decide what form of computed jump to use (if any) will vary among different implementations. Sometimes adding extra cases to a switch statement may improve performance by simplifying a compiler's generated code (e.g. if code uses cases 4-11, while cases 0-3 are handled in default fashion, adding explicit case 0:; case 1:; case 2:; case 3:; prior to the default: may result in the compiler comparing the operand against 12 and, if it's less, using a 12-item jump table. Omitting those cases might cause the compiler to subtract 4 before comparing the difference against 8, and then use an 8-item table.

One difficulty in trying to optimize switch statements is that compilers generally know better than programmers how the performance of different approaches would vary when given certain inputs, but programmers may know better than compilers what distribution of inputs a program would receive. Given something like:

if (x==0)
  y++;
else switch(x)
{
  ...
}

a "smart" compiler might recognize that changing the code to:

switch(x)
{
  case 0:
    y++;
    break;
  ...
}

could eliminate a comparison in all cases where x is non-zero, at the cost of a computed jump when x is zero. If x is non-zero most of the time, that would be a good trade. If x is zero 99.9% of the time, however, that might be a bad trade. Different compiler writers differ as to the extent to which they will try to optimize constructs like former into the latter.

Upvotes: 7

alinsoar
alinsoar

Reputation: 15783

The switch statement is usually compiled via jump tables not by simple comparaisons.

So, there is no loss in performance if you permute the case-statements.

However, sometimes it is useful to keep more cases in consecutive order and not to use break/return in some entries, in order for the flow of execution to go to the next case and avoid duplicating the code.

When the differences of numbers between case number are big from one case to the other, such in case 10: and case 200000: the compiler will surely not generate jump tables, as it should fill about 200K entries almost all with a pointer toward the default: case, and in this case it will use comparaisons.

Upvotes: 5

Graham
Graham

Reputation: 1705

Your question is very simple - your code isn't the same, so it won't produce the same assembly! Optimised code doesn't just depend on the individual statements, but also on everything around it. And in this case, it's easy to explain the optimisation.

In your first example, case 1 results in a=1, and case 2 results in a=2. The compiler can optimise this to set a=sc for those two cases, which is a single statement.

In your second example, case 1 results in a=2, and case 2 results in a=1. The compiler can no longer take that shortcut, so it has to explicitly set a=1 or a=2 for both cases. Of course this needs more code.

If you simply took your first example and swapped the order of the cases and conditional code then you should get the same assembler.

You can test this optimisation by using the code

int main()
{
    int a, sc = 1;

    switch (sc)
    {
        case 1:
        case 2:
            a = sc;
            break;
    }
}

which should also give exactly the same assembler.

Incidentally, your test code assumes that sc is actually read. Most modern optimising compilers are able to spot that sc does not change between assignment and the switch statement, and replace reading sc with a constant value 1. Further optimisation will then remove the redundant branch(es) of the switch statement, and then even the assignment could be optimised away because a does not actually change. And from the point of view of variable a, the compiler may also discover that a is not read elsewhere and so remove that variable from the code completely.

If you really want sc to be read and a to be set, you need to declare them both volatile. Fortunately the compiler seems to have implemented it the way you expected - but you absolutely cannot expect this when you have optimisation turned on.

Upvotes: 4

Michael Geary
Michael Geary

Reputation: 28850

You're looking at unoptimized code, so studying it for performance isn't very meaningful. If you look at optimized code for your examples, you'll find that it doesn't do the comparisons at all! The optimizer notices that the switch variable sc always has the value 1, so it removes the unreachable case 2.

The optimizer also sees that the variable a isn't used after it's assigned, so it removes the code in case 1 as well, leaving main() an empty function. And it removes the function prolog/epilog that manipulates rbp since that register is unused.

So the optimized code ends up the same for either version of your main() function:

main:
    xor eax, eax
    ret

In short, for the code in the question, it doesn't matter which order you put the case statements, because none of that code will be generated at all.

Would the case order matter in a more real-life example where the code actually is generated and used? Probably not. Note that even in your unoptimized generated code, both versions test for the two case values in numeric order, checking first for 1 and then for 2, regardless of the order in the source code. Clearly the compiler is doing some sorting even in the unoptimized code.

Be sure to note Glenn and Lundin's comments: the order of the case sections is not the only change between your two examples, the actual code is different too. In one of them, the case values match the values set into a, but not so in the other.

Compilers use various strategies for switch/case statements depending on the actual values used. They may use a series of comparisons as in these examples, or perhaps a jump table. It can be interesting to study the generated code, but as always, if performance matters, watch your optimization settings and test it in a real-life situation.

Upvotes: 59

Compiler optimization of switch statements is tricky. Of course, you need to enable optimizations (e.g. try to compile your code with gcc -O2 -fverbose-asm -S with GCC and look inside the generated .s assembler file). BTW on both of your examples my GCC 7 on Debian/Sid/x86-64 gives simply:

        .type   main, @function
main:
.LFB0:
        .cfi_startproc
# rsp.c:13: }
        xorl    %eax, %eax      #
        ret
        .cfi_endproc

(so there in no trace of switch in that generated code)

If you need to understand how a compiler could optimize switch, there are some papers on that subject, such as this one.

If I have more numbers of switch cases, then an order of cases effect on performance?

Not in general, if you are using some optimizing compiler and asking it to optimize. See also this.

If that matters to you so much (but it should not, leave micro-optimizations to your compiler!), you need to benchmark, to profile and perhaps to study the generated assembler code. BTW, cache misses and register allocation could matter much more than order of case-s so I think you should not bother at all. Keep in mind the approximate timing estimates of recent computers. Put the cases in the most readable order (for the next developer working on that same source code). Read also about threaded code. If you have objective (performance related) reasons to re-order the case-s (which is very unlikely and should happen at most once in your lifetime), write some good comment explaining those reasons.

If you care that much about performance, be sure to benchmark and profile, and choose a good compiler and use it with relevant optimization options. Perhaps experiment several different optimization settings (and maybe several compilers). You may want to add -march=native (in addition of -O2 or -O3). You could consider compiling and linking with -flto -O2 to enable link-time optimizations, etc. You might also want profile based optimizations.

BTW, many compilers are huge free software projects (in particular GCC and Clang). If you care that much about performance, you might patch the compiler, extend it by adding some additional optimization pass (by forking the source code, by adding some plugin to GCC or some GCC MELT extensions). That requires months or years of work (notably to understand the internal representations and organization of that compiler).

(Don't forget to take development costs into account; in most cases, they cost much more)

Upvotes: 18

Thilo
Thilo

Reputation: 262464

Performance would depend mostly of the number of branch misses for a given dataset, not so much on the total number of cases. And that in turn depends highly on the actual data and how the compiler chose to implement the switch (dispatch table, chained conditionals, tree of conditionals -- not sure if you can even control this from C).

Upvotes: 6

Related Questions