user13709754
user13709754

Reputation:

Are ternaries or if statements faster?­

So I have two options, both functions have the identical types:

(Entry->d_type == DT_DIR ? rmdirr : remove)(CurrentEntryPath);

Or

if (Entry->d_type == DT_DIR) {
    rmdirr(CurrentEntryPath);
} else {
    remove(CurrentEntryPath);
}

I have confirmed that the ternary is %100 percent safe, because both functions are of compatible pointer types. Which one is faster (Even if less readable)?

Upvotes: 0

Views: 156

Answers (3)

John Bode
John Bode

Reputation: 123578

Rule #0 - Do not think in terms of raw speed; instead, think in terms of "which would I rather fix 8 months from now when someone reports a bug".

Rule #1 - Measure, don't guess, and don't ask people who don't have access to your system to guess. Code up both versions on the target system and profile them - examine the generated machine code, and run each version against a large enough test set to generate usable statistics and analyze the results. Consider how it is used - is it called thousands of times in a tight loop, or is it called once over the lifetime of the program? Each function involves updating the file system, which will take many orders of magnitude more time to execute than deciding which one to call regardless of which method you use.

Rule #2 - It doesn't matter how fast your code is if it gives you the wrong answer, or does the wrong thing, or exposes your credit card information to the world, or blows up if someone in the next room sneezes, or if nobody (including yourself) can fix or update it. Code for correctness first, then for readability and maintainability, then for safety and reliability, and then for speed. Most of your significant speed gains come from using the right algorithm and data structure, not your choice of flow control structure.

Rule #3 - Do not use the ternary operator in place of an if-else structure just for flow control; that's not its job. While the first version works, it's a bit eye-stabby and hard to read at a glance, and when you pick it back up six months from now you're going to ask yourself why you did that. And I can practically guarantee it won't be measurably faster or slower than the other method.

I'm not saying that speed doesn't matter - I'm saying that speed is only one thing that needs to be considered, and unless you're working in specific domains, it's not the most important thing.

Upvotes: 5

Petr Skocik
Petr Skocik

Reputation: 60143

It's very conceivable that an optimizing compiler will generate the same code for the two cases.

Curiously gcc and clang in this case don't do that and instead generate a code that literally uses function pointers for the :? case and direct jumps for the second case.

Example:

#include <dirent.h>
#include <stdio.h>
#include <unistd.h>

int rmitem0(struct dirent const*Entry)
{
    return (Entry->d_type == DT_DIR ? rmdir : remove)(Entry->d_name);
}

int rmitem1(struct dirent const*Entry)
{
    if (Entry->d_type == DT_DIR)
        return rmdir(Entry->d_name);
    else return remove(Entry->d_name);
}

x86_64 clang:

0000000000000000 <rmitem0>:
   0:   80 7f 12 04             cmp    BYTE PTR [rdi+0x12],0x4
   4:   b8 00 00 00 00          mov    eax,0x0  5: R_X86_64_32  rmdir
   9:   b9 00 00 00 00          mov    ecx,0x0  a: R_X86_64_32  remove
   e:   48 0f 44 c8             cmove  rcx,rax
  12:   48 83 c7 13             add    rdi,0x13
  16:   ff e1                   jmp    rcx

0000000000000018 <rmitem1>:
  18:   80 7f 12 04             cmp    BYTE PTR [rdi+0x12],0x4
  1c:   48 8d 7f 13             lea    rdi,[rdi+0x13]
  20:   0f 85 00 00 00 00       jne    26 <rmitem1+0xe> 22: R_X86_64_PLT32  remove-0x4
  26:   e9 00 00 00 00          jmp    2b <rmitem1+0x13>    27: R_X86_64_PLT32  rmdir-0x4

x86_64 gcc:

0000000000000000 <rmitem0>:
   0:   80 7f 12 04             cmp    BYTE PTR [rdi+0x12],0x4
   4:   74 09                   je     f <rmitem0+0xf>
   6:   48 8b 05 00 00 00 00    mov    rax,QWORD PTR [rip+0x0]        # d <rmitem0+0xd> 9: R_X86_64_REX_GOTPCRELX   remove-0x4
   d:   eb 07                   jmp    16 <rmitem0+0x16>
   f:   48 8b 05 00 00 00 00    mov    rax,QWORD PTR [rip+0x0]        # 16 <rmitem0+0x16>   12: R_X86_64_REX_GOTPCRELX  rmdir-0x4
  16:   48 83 c7 13             add    rdi,0x13
  1a:   ff e0                   jmp    rax

000000000000001c <rmitem1>:
  1c:   4c 8d 47 13             lea    r8,[rdi+0x13]
  20:   80 7f 12 04             cmp    BYTE PTR [rdi+0x12],0x4
  24:   4c 89 c7                mov    rdi,r8
  27:   75 05                   jne    2e <rmitem1+0x12>
  29:   e9 00 00 00 00          jmp    2e <rmitem1+0x12>    2a: R_X86_64_PLT32  rmdir-0x4
  2e:   e9 00 00 00 00          jmp    33 <rmitem1+0x17>    2f: R_X86_64_PLT32  remove-0x4

These two strategies should therefore have slightly different performance characteristics here, but in any case you're missing the forest for a tiny tree.

I've measured duration of a rmdir to be about 14µs on Linux.

The conditionals above should take about a fraction of a ns, a few ns at most: that's over 10,000 times faster than your bottleneck.

Upvotes: 2

0___________
0___________

Reputation: 68013

It is very difficult to judge what is actually more efficient. The if-else produces less instructions but there is a branch instruction requiring pipeline flush if the branch prediction is not met.

#define SOMEVALUE 5


int __attribute__((noinline)) foo(int x)
{
    return rand();
}

int __attribute__((noinline)) boo(int x)
{
    return rand();
}


int aaa(int x)
{
    int result;

    if(x == 5) 
        result = foo(x);
    else
        result = boo(x);

    return result;
}

int bbb(int x)
{
    int result;

    return (x == 5 ? foo : boo)(x);
}

int (*z[2])(int) = {foo, boo};

int ccc(int x)
{
    return z[!!(x == 5)](x);
}

and the resulting code:

foo:
        jmp     rand
boo:
        jmp     rand
aaa:
        cmp     edi, 5
        je      .L6
        jmp     boo
.L6:
        jmp     foo
bbb:
        cmp     edi, 5
        mov     eax, OFFSET FLAT:foo
        mov     edx, OFFSET FLAT:boo
        cmovne  rax, rdx
        jmp     rax
ccc:
        xor     eax, eax
        cmp     edi, 5
        sete    al
        jmp     [QWORD PTR z[0+rax*8]]
z:
        .quad   foo
        .quad   boo

https://godbolt.org/z/L6CFs9

In my opinion if you do such a microoptimization in the less trivial code - you need to see the produced code and decide what is more efficient.

Upvotes: 4

Related Questions