OpenMP thread affinities with static variables

Question

I have a C++ class C which contains some code, including a static variable which is meant to only be read, and perhaps a constexpr static function. For example:

template
class C {
   public:
     //some functions
     void func1();
     void func2()
     static constexpr std::size_t sfunc1(){ return T; }


   private:
     std::size_t var1;
     std::array array1;
     static int svar1;

}

The idea is to use the thread affinity mechanisms of openMP 4.5 to control the socket (NUMA architecture) where various instances of this class are executed (and therefore also place it in a memory location close to the socket to avoid using the interconnect between the NUMA nodes). It is my understanding that since this code contains a static variable it is effectively shared between all class instances so I won't have control of the memory location where the static variable will be placed, upon thread creation. Is this correct? But I presume the other non-static variables will be located at memory locations close to the socket being used? Thanks

Gem Taylor · Accepted Answer

You have to assume that the thread stack, thread-bound malloc, and thread local storage will allocate to the thread's "local" memory - so any auto or new variables should be optimised at least on the thread they were created on, though I don't know which compilers support that kind of allocation model; but as you say, static non-const data can only exist in one location. I guess if the compiler recognises const segments or constructed const segments, then after construction they could be duplicated per zone and then mapped to the same logical address? Again don't know if compilers are doing that automagically.

Non-const statics are going to be troublesome. Presumably these statics are helping to perform some sort of thread synchronisation. If they contain flags that are read often and written rarely then for best performance it may be better for the writer to write to a number of registered copies (one per zone) and each thread uses a thread-local pointer to the appropriate zone copy, than half (or 3/4) the readers are always slow. Of course, that ceases to be a simple atomic write, and a single mutex just puts you back where you started. I suspect this is roll-your-own code land.

The simple case that shouldn't be forgotten: if objects are passed between threads, then potentially a thread could be accessing a non-local object.

OpenMP thread affinities with static variables

Answers (1)

Related Questions