pythonic
pythonic

Reputation: 21615

Thread local pointer pointing to a non-thread local data

I have an array, which is not thread local, such as the following.

long array[NTHREADS];

Here array[0] is managed by thread 0, array[1] by thread 1 and so on. We did not use thread local variable, because at some point threads also have to read other threads' parts. However, most of the time they modify their own part. Ofcourse, we could modify the data using array[thread_id], but to speedup execution, I want to use a pointer.

Now since each thread manages its own data, the pointer should be thread local and assigned in the beginning. So I need something like this (in gcc syntax).

  __thread long* tl_ptr;

  tl_ptr = &array[threadid];

In this way, I can modify the thread specific data by using *tl_ptr. Now my question is, if this approach is correct? Are there any problems in this approach?

Upvotes: 1

Views: 480

Answers (2)

Gunther Piez
Gunther Piez

Reputation: 30439

If you are you writing the program in a pre C++11 dialect, strictly speaking the result will not be portable - the older C++ dialects don't have any notion of threads.

But if you use C++11 (which can possibly achieved by saying "This is now C++11" without changing a single byte and the use of a compiler flag) and stick to the threading facilities of the language (thread_local instead of __thread), the portability is guaranteed by the language.

In real life most implementations of pre C++11 work reasonably well with threads. gcc and x86_64 for example will work - gcc has built in mechanisms for threading and x86_64 is an architecture with an strong memory model (Read about the details here in the memory ordering chapter).

That said, you use thread local in a strange way. The usual semantics is

__thread long array_data;
long* array[NTHREADS];  // non thread local pointer to thread local data

Now if you modify array_data in any thread, only the thread local copy of array_data will get modified. If you need to access array_data from a different thread, say a supervisor thread which sums up all array_data, you need an array which holds pointers to all thread locals and which elements are initialized a the startup of each thread with

array[iThread] = &array_data;  // where iThread is an index for each thread

Access from a supervisor thread would look like

long sum=0;
for (int i=0; i<NTHREADS; ++i)
    sum += *array[i];

You need to make sure your access to thread locals from other threads is serialized or protected with mutexes where appropriate. In your case (x86 and gcc) the summation loop will just work - aligned longs are guaranteed to be atomic by your hardware and gcc will see the pointer as unrestricted - but be careful.

Upvotes: 0

spraff
spraff

Reputation: 33395

C++11 has a memory model which defines behaviour in these circumstances. C and C++03 and earlier are basically single-threaded at heart -- no native atomics/fences.

This means that unless you use a C++11 compiler (which implements the memory model) you might get strange effects due to cache coherency problems etc. This is cpu-specific.

If you know which processor you'll be running on, that it has a suitably "strong" memory model, you can establish that your approach is safe, but it won't be portable.

Upvotes: 1

Related Questions