Reputation:
I was playing with compiler explorer and found that these 2 functions generate different assembly in both gcc and clang. I expected that after inlining they would produce identical expression trees and hence identical and optimal assembly.
constexpr bool is_nonzero_decimal_digit(char const c) noexcept
{
return c == '1' || c == '2' || c == '3' || c == '4' || c == '5'
|| c == '6' || c == '7' || c == '8' || c == '9';
}
bool is_decimal_digit_v1(char const c) noexcept
{
return c == '0' || is_nonzero_decimal_digit(c);
}
bool is_decimal_digit_v2(char const c) noexcept
{
return c == '0' || c == '1' || c == '2' || c == '3' || c == '4'
|| c == '5' || c == '6' || c == '7' || c == '8' || c == '9';
}
Clang 3.9.1 -std=c++1z -O3 result
is_decimal_digit_v1(char):
cmp dil, 48
sete cl
add dil, -49
cmp dil, 9
setb al
or al, cl
ret
is_decimal_digit_v2(char):
add dil, -48
cmp dil, 10
setb al
ret
gcc 6.3 -std=c++1z -O3 result
is_decimal_digit_v1(char):
cmp dil, 48
je .L3
sub edi, 49
cmp dil, 8
setbe al
ret
.L3:
mov eax, 1
ret
is_decimal_digit_v2(char):
sub edi, 48
cmp dil, 9
setbe al
ret
So, is this a quirk of optimizers or the result of language rules forbidding optimizations?
Upvotes: 6
Views: 306
Reputation:
It is a quirk of gcc < 7.0 and clang optimizers. As Cornstalks pointed in the comments, gcc 7.0 is able to generate optimal assembly. I also checked VC++ 2015, which does it too:
is_decimal_digit_v2:
sub cl, 48
cmp cl, 9
setbe al
ret 0
is_decimal_digit_v1:
sub cl, 48
cmp cl, 9
setbe al
ret 0
As T.C. pointed, inlining is performed after some optimization passes, which in this particular code merged a chain of comparisons into a simpler range check. It is useful to do so before inlining to make leaf functions smaller, which in turn increases their chances to be inlined. Basically, v1 function was transformed into something like this:
bool is_decimal_digit_v3(char const c) noexcept
{
if (c == 48) return true;
// this is what was inlined
char tmp = c - 49;
return tmp >= 0 && tmp < 9;
}
whereas v2 was transformed into much simpler form:
bool is_decimal_digit_v4(char const c) noexcept
{
char tmp = c - 48;
return tmp >= 0 && tmp < 10;
}
Generated assembly for v3 is similar to the one generated for v1
#clang 3.9.1
is_decimal_digit_v3(char): # @is_decimal_digit_v3(char)
cmp dil, 48
sete cl
add dil, -49
cmp dil, 9
setb al
or al, cl
ret
# gcc 6.3
is_decimal_digit_v3(char):
cmp dil, 48
je .L8
sub edi, 49
cmp dil, 8
setbe al
ret
.L8:
mov eax, 1
ret
I guess, to transform v3 into v4, it requires some non trivial analysis which gcc 7.0 is able to do. This version generates absolutely the same assembly for all four snippets:
is_decimal_digit_v1(char):
sub edi, 48
cmp dil, 9
setbe al
ret
is_decimal_digit_v2(char):
sub edi, 48
cmp dil, 9
setbe al
ret
is_decimal_digit_v3(char):
sub edi, 48
cmp dil, 9
setbe al
ret
is_decimal_digit_v4(char):
sub edi, 48
cmp dil, 9
setbe al
ret
It is interesting that VC++2015 is not able to transform v3 into v4 and produces this assembly:
is_decimal_digit_v3:
cmp cl, 48
jne SHORT $LN2@is_decimal
mov al, 1
ret 0
$LN2@is_decimal:
xor eax, eax
sub cl, 49
cmp cl, 8
setbe al
ret 0
If I had to guess, I'd say the reason why it generates optimal code for v1 but not for v3 is because it does inlining before reducing comparisons to the range check.
Upvotes: 3