Is this a quirk of optimizers or the result of language rules forbidding optimizations?

Question

I was playing with compiler explorer and found that these 2 functions generate different assembly in both gcc and clang. I expected that after inlining they would produce identical expression trees and hence identical and optimal assembly.

constexpr bool is_nonzero_decimal_digit(char const c) noexcept
{
    return c == '1' || c == '2' || c == '3' || c == '4' || c == '5'
        || c == '6' || c == '7' || c == '8' || c == '9';
}

bool is_decimal_digit_v1(char const c) noexcept
{
    return c == '0' || is_nonzero_decimal_digit(c);
}

bool is_decimal_digit_v2(char const c) noexcept
{
    return c == '0' || c == '1' || c == '2' || c == '3' || c == '4' 
        || c == '5' || c == '6' || c == '7' || c == '8' || c == '9';
}

Clang 3.9.1 -std=c++1z -O3 result

is_decimal_digit_v1(char):
    cmp     dil, 48
    sete    cl
    add     dil, -49
    cmp     dil, 9
    setb    al
    or      al, cl
    ret

is_decimal_digit_v2(char):
    add     dil, -48
    cmp     dil, 10
    setb    al
    ret

gcc 6.3 -std=c++1z -O3 result

is_decimal_digit_v1(char):
    cmp     dil, 48
    je      .L3
    sub     edi, 49
    cmp     dil, 8
    setbe   al
    ret
.L3:
    mov     eax, 1
    ret

is_decimal_digit_v2(char):
    sub     edi, 48
    cmp     dil, 9
    setbe   al
    ret

So, is this a quirk of optimizers or the result of language rules forbidding optimizations?

user4832129 · Accepted Answer

It is a quirk of gcc < 7.0 and clang optimizers. As Cornstalks pointed in the comments, gcc 7.0 is able to generate optimal assembly. I also checked VC++ 2015, which does it too:

is_decimal_digit_v2:
    sub    cl, 48
    cmp    cl, 9
    setbe  al
    ret    0
is_decimal_digit_v1:
    sub    cl, 48
    cmp    cl, 9
    setbe  al
    ret    0

As T.C. pointed, inlining is performed after some optimization passes, which in this particular code merged a chain of comparisons into a simpler range check. It is useful to do so before inlining to make leaf functions smaller, which in turn increases their chances to be inlined. Basically, v1 function was transformed into something like this:

bool is_decimal_digit_v3(char const c) noexcept
{
    if (c == 48) return true;
    // this is what was inlined
    char tmp = c - 49;
    return tmp >= 0 && tmp < 9;
}

whereas v2 was transformed into much simpler form:

bool is_decimal_digit_v4(char const c) noexcept
{
  char tmp = c - 48;
  return tmp >= 0 && tmp < 10;
}

Generated assembly for v3 is similar to the one generated for v1

#clang 3.9.1
is_decimal_digit_v3(char):               # @is_decimal_digit_v3(char)
    cmp     dil, 48
    sete    cl
    add     dil, -49
    cmp     dil, 9
    setb    al
    or      al, cl
    ret
# gcc 6.3
is_decimal_digit_v3(char):
    cmp     dil, 48
    je      .L8
    sub     edi, 49
    cmp     dil, 8
    setbe   al
    ret
.L8:
    mov     eax, 1
    ret

I guess, to transform v3 into v4, it requires some non trivial analysis which gcc 7.0 is able to do. This version generates absolutely the same assembly for all four snippets:

is_decimal_digit_v1(char):
    sub     edi, 48
    cmp     dil, 9
    setbe   al
    ret
is_decimal_digit_v2(char):
    sub     edi, 48
    cmp     dil, 9
    setbe   al
    ret
is_decimal_digit_v3(char):
    sub     edi, 48
    cmp     dil, 9
    setbe   al
    ret
is_decimal_digit_v4(char):
    sub     edi, 48
    cmp     dil, 9
    setbe   al
    ret

It is interesting that VC++2015 is not able to transform v3 into v4 and produces this assembly:

is_decimal_digit_v3:
    cmp    cl, 48
    jne    SHORT $LN2@is_decimal
    mov    al, 1
    ret    0
$LN2@is_decimal:
    xor    eax, eax
    sub    cl, 49
    cmp    cl, 8
    setbe  al
    ret    0

If I had to guess, I'd say the reason why it generates optimal code for v1 but not for v3 is because it does inlining before reducing comparisons to the range check.

Is this a quirk of optimizers or the result of language rules forbidding optimizations?

Answers (1)

Related Questions