Reputation: 31851
How can I read/write a thread local variable from another thread? That is, in Thread A I would like to access the variable in Thread B's thread local storage area. I know the ID of the other thread.
The variable is declared as __thread
in GCC. Target platform is Linux, but independence might be nice (GCC specific is okay however).
Lacking a thread-start hook there is no way I can simply track this value at the start of each thread. All threads need to be tracked this way (not just specially started ones).
A higher level wrapper like boost thread_local_storage or using pthread keys is not an option. I require the performance of using a true __thread
local variable.
FIRST ANSWER IS WRONG: One cannot use global variables for what I want to do. Each thread must have its own copy of the variable. Furthermore, those variables must be __thread
variables for performance reasons (an equally efficient solution would also be okay, but I know of none). I also don't control the thread entry points, thus there is no possibility for those threads to register any kind of structure.
Thread Local is not private: Another misunderstanding about thread-local variables. These are in no way some kind of private variable for the thread. They are globally addressable memory, with the restriction that their lifetime is tied to the thread. Any function, from any thread, if given a pointer to these variables can modify them. The question above is essentially about how to get that pointer address.
Upvotes: 26
Views: 16888
Reputation: 32162
This pretty much does what you need and if not modify to your requirements.
On linux it uses pthread_key_create
and windows uses TlsAlloc
. They are both a way of retrieving a thread local by key. Howevever if you register the keys you can then access the data on other threads.
The idea of EnumerableThreadLocal is that you perform a local operation in your threads and then reduce the results back down in your main thread.
tbb has a similar function called enumerable_thread_specific and the motiviation for it can be found at https://oneapi-src.github.io/oneTBB/main/tbb_userguide/design_patterns/Divide_and_Conquer.html
The below was an attempt to mimic the tbb code without having a dependency on tbb. The downside with the below code is you are limited to 1088 keys on windows.
template <typename T>
class EnumerableThreadLocal
{
#if _WIN32 || _WIN64
using tls_key_t = DWORD;
void create_key() { my_key = TlsAlloc(); }
void destroy_key() { TlsFree(my_key); }
void set_tls(void *value) { TlsSetValue(my_key, (LPVOID)value); }
void *get_tls() { return (void *)TlsGetValue(my_key); }
#else
using tls_key_t = pthread_key_t;
void create_key() { pthread_key_create(&my_key, nullptr); }
void destroy_key() { pthread_key_delete(my_key); }
void set_tls(void *value) const { pthread_setspecific(my_key, value); }
void *get_tls() const { return pthread_getspecific(my_key); }
#endif
std::vector<std::pair<std::thread::id, std::unique_ptr<T>>> m_thread_locals;
std::mutex m_mtx;
tls_key_t my_key;
using Factory = std::function<std::unique_ptr<T>()>;
Factory m_factory;
static auto DefaultFactory()
{
return std::make_unique<T alignas(hardware_constructive_interference_size)>();
}
public:
EnumerableThreadLocal(Factory factory = &DefaultFactory ) : m_factory(factory)
{
create_key();
}
~EnumerableThreadLocal()
{
destroy_key();
}
EnumerableThreadLocal(const EnumerableThreadLocal &other)
{
create_key();
// deep copy the m_thread_locals
m_thread_locals.reserve(other.m_thread_locals.size());
for (const auto &pair : other.m_thread_locals)
{
m_thread_locals.emplace_back(pair.first, std::make_unique<T>(*pair.second));
}
}
EnumerableThreadLocal &operator=(const EnumerableThreadLocal &other)
{
if (this != &other)
{
destroy_key();
create_key();
m_thread_locals.clear();
m_thread_locals.reserve(other.m_thread_locals.size());
for (const auto &pair : other.m_thread_locals)
{
m_thread_locals.emplace_back(pair.first, std::make_unique<T>(*pair.second));
}
}
return *this;
}
EnumerableThreadLocal(EnumerableThreadLocal &&other) noexcept
{
// deep move
my_key = other.my_key;
// deep move the m_thread_locals
m_thread_locals = std::move(other.m_thread_locals);
other.my_key = 0;
}
EnumerableThreadLocal &operator=(EnumerableThreadLocal &&other) noexcept
{
if (this != &other)
{
destroy_key();
my_key = other.my_key;
m_thread_locals = std::move(other.m_thread_locals);
other.my_key = 0;
}
return *this;
}
T *Get ()
{
void *v = get_tls();
if (v)
{
return reinterpret_cast<T *>(v);
}
else
{
const std::scoped_lock l(m_mtx);
for (const auto &[thread_id, uptr] : m_thread_locals)
{
// This search is necessary for the case if we run out of TLS indicies in customer's process, and we do at least slow lookup
if (thread_id == std::this_thread::get_id())
{
set_tls(reinterpret_cast<void *>(uptr.get()));
return uptr.get();
}
}
m_thread_locals.emplace_back(std::this_thread::get_id(), m_factory());
T *ptr = m_thread_locals.back().second.get();
set_tls(reinterpret_cast<void *>(ptr));
return ptr;
}
}
T const * Get() const
{
return const_cast<EnumerableThreadLocal *>(this)->Get();
}
T & operator *()
{
return *Get();
}
T const & operator *() const
{
return *Get();
}
T * operator ->()
{
return Get();
}
T const * operator ->() const
{
return Get();
}
template <typename F>
void Enumerate(F fn)
{
const std::scoped_lock lock(m_mtx);
for (auto &[thread_id, ptr] : m_thread_locals)
fn(*ptr);
}
};
and a suite of test cases to show you how it works
#include <thread>
#include <string>
#include "gtest/gtest.h"
#include "EnumerableThreadLocal.hpp"
TEST(EnumerableThreadLocal, BasicTest)
{
const int N = 10;
v31::EnumerableThreadLocal<std::string> tls;
// Create N threads and assign a string including the thread ID to the tls
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
{
threads.emplace_back([&tls, i]()
{ *tls = "Thread " + std::to_string(i); });
}
// Wait for all threads to finish
for (auto &thread : threads)
thread.join();
std::vector<std::string> expected;
tls.Enumerate([&](std::string &s)
{ expected.push_back(s); });
// Sort the expected vector
std::sort(expected.begin(), expected.end());
// check the expected vector
for (int i = 0; i < N; ++i)
{
ASSERT_EQ(expected[i], "Thread " + std::to_string(i));
}
}
// Create a non copyable type, non moveable type
struct NonCopyable
{
int i=0;
NonCopyable() = default;
NonCopyable(const NonCopyable &) = delete;
NonCopyable(NonCopyable &&) = delete;
NonCopyable &operator=(const NonCopyable &) = delete;
NonCopyable &operator=(NonCopyable &&) = delete;
};
// A test to see if we can insert non moveable/ non copyable types to the tls
TEST(EnumerableThreadLocal, NonCopyableTest)
{
const int N = 10;
v31::EnumerableThreadLocal<NonCopyable> tls;
// Create N threads and assign a string including the thread ID to the tls
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
{
threads.emplace_back([&tls, i]()
{ tls->i=i; });
}
// Wait for all threads to finish
for (auto &thread : threads)
thread.join();
std::vector<int> expected;
tls.Enumerate([&](NonCopyable &s)
{ expected.push_back(s.i); });
// Sort the expected vector
std::sort(expected.begin(), expected.end());
// check the expected vector
for (int i = 0; i < N; ++i)
{
ASSERT_EQ(expected[i], i);
}
}
const int N = 10;
v31::EnumerableThreadLocal<std::string> CreateFixture()
{
v31::EnumerableThreadLocal<std::string> tls;
// Create N threads and assign a string including the thread ID to the tls
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
{
threads.emplace_back([&tls, i]()
{ *tls = "Thread " + std::to_string(i); });
}
// Wait for all threads to finish
for (auto &thread : threads)
thread.join();
return tls;
}
void CheckFixtureCopy(v31::EnumerableThreadLocal<std::string> & tls)
{
std::vector<std::string> expected;
tls.Enumerate([&](std::string &s)
{ expected.push_back(s); });
// Sort the expected vector
std::sort(expected.begin(), expected.end());
// check the expected vector
for (int i = 0; i < N; ++i)
{
ASSERT_EQ(expected[i], "Thread " + std::to_string(i));
}
}
void CheckFixtureEmpty(v31::EnumerableThreadLocal<std::string> & tls)
{
std::vector<std::string> expected;
tls.Enumerate([&](std::string &s)
{ expected.push_back(s); });
ASSERT_EQ(expected.size(), 0);
}
/// Test for copy construct of EnumerableThreadLocal
TEST(EnumerableThreadLocal, Copy)
{
auto tls = CreateFixture();
// Copy the tls
auto tls_copy = tls;
CheckFixtureCopy(tls_copy);
CheckFixtureCopy(tls);
}
/// Test for move construct of EnumerableThreadLocal
TEST(EnumerableThreadLocal, Move)
{
auto tls = CreateFixture();
// Copy the tls
auto tls_copy = std::move(tls);
CheckFixtureCopy(tls_copy);
CheckFixtureEmpty(tls);
}
/// Test for copy assign of EnumerableThreadLocal
TEST(EnumerableThreadLocal, CopyAssign)
{
auto tls = CreateFixture();
// Copy the tls
v31::EnumerableThreadLocal<std::string> tls_copy;
CheckFixtureEmpty(tls_copy);
tls_copy = tls;
CheckFixtureCopy(tls_copy);
CheckFixtureCopy(tls);
}
/// Test for move assign of EnumerableThreadLocal
TEST(EnumerableThreadLocal, MoveAssign)
{
auto tls = CreateFixture();
// Copy the tls
v31::EnumerableThreadLocal<std::string> tls_copy;
CheckFixtureEmpty(tls_copy);
tls_copy = std::move(tls);
CheckFixtureCopy(tls_copy);
CheckFixtureEmpty(tls);
}
//class with no default constructor
struct NoDefaultConstructor
{
int i;
NoDefaultConstructor(int i) : i(i) {}
};
// Test for using objects with no default constructor
TEST(EnumerableThreadLocal, NoDefaultConstructor)
{
const int N = 10;
v31::EnumerableThreadLocal<NoDefaultConstructor> tls([]{return std::make_unique<NoDefaultConstructor>(0);});
// Create N threads and assign a string including the thread ID to the tls
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
{
threads.emplace_back([&tls, i]()
{ tls->i = i; });
}
// Wait for all threads to finish
for (auto &thread : threads)
thread.join();
// enumerate and sort and verify
std::vector<int> expected;
tls.Enumerate([&](NoDefaultConstructor &s)
{ expected.push_back(s.i); });
// Sort the expected vector
std::sort(expected.begin(), expected.end());
// check the expected vector
for (int i = 0; i < N; ++i)
{
ASSERT_EQ(expected[i], i);
}
}
Upvotes: 0
Reputation: 31
This is an old question, but since there is no answer given, why not use a class that has its own static registration?
#include <mutex>
#include <thread>
#include <unordered_map>
struct foo;
static std::unordered_map<std::thread::id, foo*> foos;
static std::mutex foos_mutex;
struct foo
{
foo()
{
std::lock_guard<std::mutex> lk(foos_mutex);
foos[std::this_thread::get_id()] = this;
}
};
static thread_local foo tls_foo;
Of course you would need some kind of synchronization between the threads to ensure that the thread had registered the pointer, but you can then grab it from the map from any thread where you know the thread's id.
Upvotes: 3
Reputation: 2592
I am searching for the same thing.
As I see nobody has answered your question after having searched the web in all ways I arrived to the subsequent information: supposing to compile for gcc on linux (ubuntu) and using -m64, the segment register gs holds the value 0. The hidden part of the segment (holding the linear address)
points to the thread specific local area.
That area contains at that address the address of that address ( 64 bits ). At lower addresses are stored all thread local variables.
That address is the native_handle()
.
So in order to access a threads local data you should do it via that pointer.
In other words: (char*)&variable-(char*)myThread.native_handle()+(char*)theOtherThread.native_handle()
The code that demonstrates the above supposing g++,linux,pthreads is:
#include <iostream>
#include <thread>
#include <sstream>
thread_local int B=0x11111111,A=0x22222222;
bool shouldContinue=false;
void code(){
while(!shouldContinue);
std::stringstream ss;
ss<<" A:"<<A<<" B:"<<B<<std::endl;
std::cout<<ss.str();
}
//#define ot(th,variable)
//(*( (char*)&variable-(char*)(pthread_self())+(char*)(th.native_handle()) ))
int& ot(std::thread& th,int& v){
auto p=pthread_self();
intptr_t d=(intptr_t)&v-(intptr_t)p;
return *(int*)((char*)th.native_handle()+d);
}
int main(int argc, char **argv)
{
std::thread th1(code),th2(code),th3(code),th4(code);
ot(th1,A)=100;ot(th1,B)=110;
ot(th2,A)=200;ot(th2,B)=210;
ot(th3,A)=300;ot(th3,B)=310;
ot(th4,A)=400;ot(th4,B)=410;
shouldContinue=true;
th1.join();
th2.join();
th3.join();
th4.join();
return 0;
}
Upvotes: 5
Reputation: 31851
I was unfortunately never able to find a way to do this.
Without some kind of thread init hook there just doesn't appear to be a way to get at that pointer (short of ASM hacks that would be platform dependent).
Upvotes: 2
Reputation: 437326
If you want thread local variables that are not thread local, why don't you use global variables instead?
Important clarification!
I am not suggesting that you use a single global to replace a thread-local variable. I 'm suggesting of using a single global array or other suitable collection of values to replace one thread-local variable.
You will have to provide synchronization of course, but since you want to expose a value modified in thread A to thread B there's no getting around that.
Update:
The GCC documentation on __thread
says:
When the address-of operator is applied to a thread-local variable, it is evaluated at run-time and returns the address of the current thread's instance of that variable. An address so obtained may be used by any thread. When a thread terminates, any pointers to thread-local variables in that thread become invalid.
Therefore, if you insist on going this way I imagine it's possible to get the address of a thread local variable from the thread it belongs to, just after the thread is spawned. You could then store a pointer to that memory location to a map (thread id => pointer), and let other threads access the variable this way. This assumes that you own the code for the spawned thread.
If you are really adventurous, you could try digging up information on ___tls_get_addr
(start from this PDF which is linked to by the aforementioned GCC docs). But this approach is so highly compiler and platform specific and so lacking in documentation that it should be causing alarms to go off in anyone's head.
Upvotes: 16