Reputation: 9278
Edit: It seems this is a compiler bug indeed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82803
I am writing a wrapper for writing logs that uses TLS to store a std::stringstream
buffer. This code will be used by shared-libraries. When looking at the code on godbolt.org it seems that neither gcc nor clang will cache the result of a TLS lookup (the loop repeatedly calls '__tls_get_addr()' when I believe I have designed my class in a way that should let it.
#include <sstream>
class LogStream
{
public:
LogStream()
: m_buffer(getBuffer())
{
}
LogStream(std::stringstream& buffer)
: m_buffer(buffer)
{
}
static std::stringstream& getBuffer()
{
thread_local std::stringstream buffer;
return buffer;
}
template <typename T>
inline LogStream& operator<<(const T& t)
{
m_buffer << t;
return *this;
}
private:
std::stringstream& m_buffer;
};
int main()
{
LogStream log{};
for (int i = 0; i < 12345678; ++i)
{
log << i;
}
}
Looking at the assembly code output both gcc and clang generate pretty similar output:
clang 5.0.0:
xor ebx, ebx
.LBB0_3: # =>This Inner Loop Header: Depth=1
data16
lea rdi, [rip + LogStream::getBuffer[abi:cxx11]()::buffer[abi:cxx11]@TLSGD]
data16
data16
rex64
call __tls_get_addr@PLT // Called on every loop iteration.
lea rdi, [rax + 16]
mov esi, ebx
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)@PLT
inc ebx
cmp ebx, 12345678
jne .LBB0_3
gcc 7.2:
xor ebx, ebx
.L3:
lea rdi, guard variable for LogStream::getBuffer[abi:cxx11]()::buffer@tlsld[rip]
call __tls_get_addr@PLT // Called on every loop iteration.
mov esi, ebx
add ebx, 1
lea rdi, LogStream::getBuffer[abi:cxx11]()::buffer@dtpoff[rax+16]
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)@PLT
cmp ebx, 12345678
jne .L3
How can I convince both compilers that the lookup doesn't need to be repeatedly done?
Compiler options: -std=c++11 -O3 -fPIC
Upvotes: 2
Views: 217
Reputation: 71889
This really looks like an optimization bug in both Clang and GCC.
Here's what I think happens. (I might be completely off.) The compiler completely inlines everything down to this code:
int main()
{
// pseudo-access
std::stringstream& m_buffer = LogStream::getBuffer::buffer;
for (int i = 0; i < 12345678; ++i)
{
m_buffer << i;
}
}
And then, not realizing that access to a thread-local is very expensive under -fPIC
, it decides that the temporary reference to the global is not necessary and inlines that as well:
int main()
{
for (int i = 0; i < 12345678; ++i)
{
// pseudo-access
LogStream::getBuffer::buffer << i;
}
}
Whatever actually happens, this is clearly a pessimization of the code your wrote. You should report this as a bug to GCC and Clang.
GCC bugtracker: https://gcc.gnu.org/bugzilla/
Clang bugtracker: https://bugs.llvm.org/
Upvotes: 2