Reputation: 3819

Confused about the use of C++ (static) thread_local declared inside a function passed to (j)thread

I started to see a few C++ related posts on Stackoverflow in which people suggest to use thread_local within the function that is passed to (j)thread. For example:

How do I generate thread-safe uniform random numbers?

Say we have something like this:

#include <thread>
#include <random>
#include <mutex>

void thread_function()
{
    static thread_local std::default_random_engine gen;
    std::uniform_real_distribution<float> dist(0.0f, 1.f);
    unsigned int a{ 1 };

    float b = a * dist(gen);

    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "b: " << b << '\n';
    }
}

int main()
{
    std::jthread A(thread_function);
    std::jthread B(thread_function);
    A.join();
    B.join();

    return 0;
}

Isn't the random engine and the variable a both stored on the thread's stack? My understanding was that thread_local should be use like so:

I took this example from https://en.cppreference.com/w/cpp/language/storage_duration

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

thread_local unsigned int rage = 1; 
std::mutex cout_mutex;

void increase_rage(const std::string& thread_name)
{
    ++rage; // modifying outside a lock is okay; this is a thread-local variable
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "Rage counter for " << thread_name << ": " << rage << '\n';
}

int main()
{
    std::thread a(increase_rage, "a"), b(increase_rage, "b");
 
    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "Rage counter for main: " << rage << '\n';
    }
 
    a.join();
    b.join();
}

Possible output:

Rage counter for a: 2
Rage counter for main: 1
Rage counter for b: 2

In this particular case, it makes sense to me, since the variable rage is declared at the global scope but because it's declared as thread_local, each thread owns a similar variable that threads can edit independently from each other.

But then shouldn't this be equivalent to?

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

std::mutex cout_mutex;

void increase_rage(const std::string& thread_name)
{
    unsigned int rage = 1; 
    ++rage; // modifying outside a lock is okay; this is a thread-local variable
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "Rage counter for " << thread_name << ": " << rage << '\n';
}

int main()
{
    std::thread a(increase_rage, "a"), b(increase_rage, "b");
 
    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        //std::cout << "Rage counter for main: " << rage << '\n';
    }
 
    a.join();
    b.join();
}

Of course in this example rage isn't available in the main function any longer, yet this raises 2 questions:

Does it make sense to declare a variable thread_local in the thread function? Or is it intended to be used like in the cppreference example - ... only (without making the first example however illegal yet useless)?
If it makes sense to have it used in the thread function as well (say in thread_function) what's the difference between the variable that's declared as thread_local and the variable a that is not. To me both are stored on the thread's stack (and are "local to the thread")?

Many thanks for your kind explanation.

Edit / Examples / Solution

For future readers, I thought it would be great to add examples that practically show the difference between a variable set as thread_local and one that's not. Thanks for all the contributions. They all helped putting the pieces of the puzzle together (there's -- surprisingly -- very little examples about this topic on the internet at this date). Note: my question was more about what's the difference between a variable (declared inside a function called by a thread) tagged thread_local compared to a variable that's not (rather than about duration), with if possible, concrete examples showing the difference.

Example 1: recursion

I didn't understand at first the use of thread_local within the scope of the function run by the thread initially, until @RaymondChen and @SolomonSlow mentioned the idea of recursion. I didn't think about recursion so unless someone mentions this to you, that's not necessarily obvious, but indeed one may need to call the thread function from within the thread function, etc. In each case declaring a variable thread_local within the scope of the thread function makes sense (example below). The state of the variable remains "global" (within the context of the thread) to the successive recursive call to the thread function (you can see in the outcome that a gets incremented while b keeps its initial value (state of b is "initialized" each time the thread function is called while a gets incremented).

#include <thread>
#include <mutex>
#include <iostream>

void thread_func()
{
    thread_local int a { 0 };
    int b{ 0 };

    {
        static std::mutex m;
        std::lock_guard<std::mutex> lock{ m };
        std::cout << "a: " << a++ << " b: " << b++ << std::endl;
    }

    if (a <= 2)
        thread_func();

}

int main()
{
    std::jthread a(thread_func);
    a.join();

    return 0;
}

Outcome:

a: 0 b: 0
a: 1 b: 0
a: 2 b: 0

Example 2: thread_local variable declared globally

I would expect that this is a more "typical" use of thead_local (at least this is what the cppreference example shows/uses) where a variable is declared thread_local at the global scope (the c variable in this example) of the program, so it can be called and used by the main function, yet, each thread has its own copy of the variable and maintains its state, independently from other threads (including the main one). And this state in maintain throughout the function that the thread function may eventually call (in this example thread_func calls another_func).

#include <thread>
#include <mutex>
#include <iostream>

thread_local int c{ 0 };

void another_func()
{
    c++;
}

void thread_func(int id)
{
    thread_local int a { 0 };
    int b{ 0 };

    {
        static std::mutex m;
        std::lock_guard<std::mutex> lock{ m };
        std::cout << "id: " << id << " Results -> a: " << a++ << " b: " << b++ << " c: " << c << std::endl;
    }
    another_func();

    if (a <= 2)
        thread_func(id);
}

int main()
{
    std::jthread a(thread_func, 1);
    std::jthread b(thread_func, 2);

    a.join();
    b.join();

    std::cout << "Goodbye: " << c << std::endl;

    return 0;
}

Outcome

id: 2 Results -> a: 0 b: 0 c: 0
id: 2 Results -> a: 1 b: 0 c: 1
id: 2 Results -> a: 2 b: 0 c: 2
id: 1 Results -> a: 0 b: 0 c: 0
id: 1 Results -> a: 1 b: 0 c: 1
id: 1 Results -> a: 2 b: 0 c: 2
Goodbye: 0

Upvotes: 3

Answers (2)

Solomon Slow

Reputation: 27190

void thread_function()
{
    static thread_local std::default_random_engine gen;
    unsigned int a{ 1 };
    ...
}
Isn't the random engine and the variable a both stored on the thread's stack?

The gen variable not stored on any stack. It's static. A static local variable can only be accessed from within the block where it is declared, but other than that, it behaves exactly like a global variable. It gets initialized one time before its first use, and then after that, it continues to exist for the lifetime of the program. Upon coming back into the block for the Nth time, it will have whatever value it had when some thread left the block for the (N-1)th time.

The gen variable also is, thread_local, which means that a different version of it exists for each different thread that enters the block.

The a variable is not static, and so it gets re-initialized every time any thread enters the block, and it is destroyed when the thread leaves the block.

Upvotes: 2

rustyx

Reputation: 85452

This is a question of storage duration.

1. automatic storage duration:

void increase_rage(const std::string& thread_name)
{
    unsigned int rage = 1;

Here rage is created and destroyed within the scope of each function invocation. Each time the function is invoked, a new instance is created and initialized to 1. So it won't work for the purpose of counting invocations.

2. static storage duration:

void increase_rage(const std::string& thread_name)
{
    static unsigned int rage = 1;  // or at global scope

Here the variable is allocated once in the program's data segment (not on stack). In this case the value will persist between invocations. It will have only one value shared by all threads and will require synchronization to access from multiple threads (that can be solved using a mutex or std::atomic<int>).

3. thread storage duration:

void increase_rage(const std::string& thread_name)
{
    static thread_local unsigned int rage = 1;

Here the variable is allocated once for each thread (in TLS storage) and deallocated when that thread ends. It can count function invocations per thread and does not require synchronization.

Note that static is implied when thread_local is used at block scope, so we can omit it here.

Upvotes: 3

Confused about the use of C++ (static) thread_local declared inside a function passed to (j)thread

Edit / Examples / Solution

Answers (2)

Related Questions