James
James

Reputation: 9278

Result of TLS variable access not cached

Edit: It seems this is a compiler bug indeed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82803

I am writing a wrapper for writing logs that uses TLS to store a std::stringstream buffer. This code will be used by shared-libraries. When looking at the code on godbolt.org it seems that neither gcc nor clang will cache the result of a TLS lookup (the loop repeatedly calls '__tls_get_addr()' when I believe I have designed my class in a way that should let it.

#include <sstream>

class LogStream
{
public:
    LogStream()
    :   m_buffer(getBuffer())
    {
    }

    LogStream(std::stringstream& buffer)
    :   m_buffer(buffer)
    {
    }

    static std::stringstream& getBuffer()
    {
        thread_local std::stringstream buffer;
        return buffer;
    }

    template <typename T>
    inline LogStream& operator<<(const T& t)
    {
        m_buffer << t;
        return *this;
    }

private:
    std::stringstream& m_buffer;
};


int main()
{
    LogStream log{};

    for (int i = 0; i < 12345678; ++i)
    {
        log << i;
    }
}

Looking at the assembly code output both gcc and clang generate pretty similar output:

clang 5.0.0:

xor ebx, ebx
.LBB0_3: # =>This Inner Loop Header: Depth=1
data16
lea rdi, [rip + LogStream::getBuffer[abi:cxx11]()::buffer[abi:cxx11]@TLSGD]
data16
data16
rex64
call __tls_get_addr@PLT    // Called on every loop iteration.
lea rdi, [rax + 16]
mov esi, ebx
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)@PLT
inc ebx
cmp ebx, 12345678
jne .LBB0_3

gcc 7.2:

xor ebx, ebx
.L3:
lea rdi, guard variable for LogStream::getBuffer[abi:cxx11]()::buffer@tlsld[rip]
call __tls_get_addr@PLT   // Called on every loop iteration.
mov esi, ebx
add ebx, 1
lea rdi, LogStream::getBuffer[abi:cxx11]()::buffer@dtpoff[rax+16]
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)@PLT
cmp ebx, 12345678
jne .L3

How can I convince both compilers that the lookup doesn't need to be repeatedly done?

Compiler options: -std=c++11 -O3 -fPIC

Godbolt link

Upvotes: 2

Views: 217

Answers (1)

Sebastian Redl
Sebastian Redl

Reputation: 71889

This really looks like an optimization bug in both Clang and GCC.

Here's what I think happens. (I might be completely off.) The compiler completely inlines everything down to this code:

int main()
{
    // pseudo-access
    std::stringstream& m_buffer = LogStream::getBuffer::buffer;
    for (int i = 0; i < 12345678; ++i)
    {
        m_buffer << i;
    }
}

And then, not realizing that access to a thread-local is very expensive under -fPIC, it decides that the temporary reference to the global is not necessary and inlines that as well:

int main()
{
    for (int i = 0; i < 12345678; ++i)
    {
        // pseudo-access
        LogStream::getBuffer::buffer << i;
    }
}

Whatever actually happens, this is clearly a pessimization of the code your wrote. You should report this as a bug to GCC and Clang.

GCC bugtracker: https://gcc.gnu.org/bugzilla/
Clang bugtracker: https://bugs.llvm.org/

Upvotes: 2

Related Questions