Reputation: 783
I was playing with the Compiler Explorer and I stumbled upon an interesting behavior with the ternary operator when using something like this:
std::string get_string(bool b)
{
return b ? "Hello" : "Stack-overflow";
}
The compiler generated code for this (clang trunk, with -O3) is this:
get_string[abi:cxx11](bool): # @get_string[abi:cxx11](bool)
push r15
push r14
push rbx
mov rbx, rdi
mov ecx, offset .L.str
mov eax, offset .L.str.1
test esi, esi
cmovne rax, rcx
add rdi, 16 #< Why is the compiler storing the length of the string
mov qword ptr [rbx], rdi
xor sil, 1
movzx ecx, sil
lea r15, [rcx + 8*rcx]
lea r14, [rcx + 8*rcx]
add r14, 5 #< I also think this is the length of "Hello" (but not sure)
mov rsi, rax
mov rdx, r14
call memcpy #< Why is there a call to memcpy
mov qword ptr [rbx + 8], r14
mov byte ptr [rbx + r15 + 21], 0
mov rax, rbx
pop rbx
pop r14
pop r15
ret
.L.str:
.asciz "Hello"
.L.str.1:
.asciz "Stack-Overflow"
However, the compiler generated code for the following snippet is considerably smaller and with no calls to memcpy
, and does not care about knowing the length of both strings at the same time. There are 2 different labels that it jumps to
std::string better_string(bool b)
{
if (b)
{
return "Hello";
}
else
{
return "Stack-Overflow";
}
}
The compiler generated code for the above snippet (clang trunk with -O3) is this:
better_string[abi:cxx11](bool): # @better_string[abi:cxx11](bool)
mov rax, rdi
lea rcx, [rdi + 16]
mov qword ptr [rdi], rcx
test sil, sil
je .LBB0_2
mov dword ptr [rcx], 1819043144
mov word ptr [rcx + 4], 111
mov ecx, 5
mov qword ptr [rax + 8], rcx
ret
.LBB0_2:
movabs rdx, 8606216600190023247
mov qword ptr [rcx + 6], rdx
movabs rdx, 8525082558887720019
mov qword ptr [rcx], rdx
mov byte ptr [rax + 30], 0
mov ecx, 14
mov qword ptr [rax + 8], rcx
ret
The same result is when I use the ternary operator with:
std::string get_string(bool b)
{
return b ? std::string("Hello") : std::string("Stack-Overflow");
}
I would like to know why the ternary operator in the first example generates that compiler code. I believe that the culprit lies within the const char[]
.
P.S: GCC does calls to strlen
in the first example but Clang doesn't.
Link to the Compiler Explorer example: https://godbolt.org/z/Exqs6G
Thank you for your time!
sorry for the wall of code
Upvotes: 70
Views: 4140
Reputation: 280
The first version returns a string object which is initialized with a not-constant expression yielding one of the string literals, so the constructor is run as for any other variable string object, thus the memcpy to do the initialization.
The other variants return either one string object initialized with a string literal or another string object initialized with another string literal, both of which can be optimized to a string object constructed from a constant expression where no memcpy is needed.
So the real answer is: the first version operates the ?: operator on char[] expressions before initializing the objects and the other versions on the string objects already being initialized.
It does not matter whether one of the versions is branchless.
Upvotes: 13
Reputation: 39748
The overarching difference here is that the first version is branchless.
16 isn’t the length of any string here (the longer one, with NUL, is only 15 bytes long); it’s an offset into the return object (whose address is passed in RDI to support RVO), used to indicate that the small-string optimization is in use (note the lack of allocation). The lengths are 5 or 5+1+8 stored in R14, which is stored in the std::string
as well as passed to memcpy
(along with a pointer chosen by CMOVNE) to load the actual string bytes.
The other version has an obvious branch (although part of the std::string
construction has been hoisted above it) and actually does have 5 and 14 explicitly, but is obfuscated by the fact that the string bytes have been included as immediate values (expressed as integers) of various sizes.
As for why these three equivalent functions produce two different versions of the generated code, all I can offer is that optimizers are iterative and heuristic algorithms; they don’t reliably find the same “best” assembly independently of their starting point.
Upvotes: 60