Mohammad Siavashi
Mohammad Siavashi

Reputation: 1262

Why Linux distributes threads among NUMA nodes almost equally?

I'm running an application with multiple threads and it seems Linux is distributing threads among NUMA nodes almost equally. Say my application spawns 4 threads and my machine has 4 sockets. I observe that each thread is assigned to a NUMA node distributing threads among all nodes almost equally.

Is there any reason for this? why not assign all on one socket and then fill the next one?

Upvotes: 1

Views: 1019

Answers (1)

Jérôme Richard
Jérôme Richard

Reputation: 50338

The best binding for an application is dependent of what the application does. It is often a good idea to spread thread on different NUMA nodes so to maximize the memory throughput as all NUMA nodes can theoretically be used in this case (assuming the application is well written and NUMA aware). If all threads are bound to the same NUMA node, then only the memory of the node can be efficiently accessed (access to memory of other NUMA node is possible but slower and pages will not be automatically efficiently map due to the first touch policy which is generally the default one on most machine). When some threads communicate a lot, it is often better to put them on the same NUMA node so not to pay latency overheads. In some cases, it can even be better to put them on the same core (but different hardware threads) so to speed up synchronization operations like locks and atomics.

If you want the scheduling and the binding to be efficient, you need to provide more information to the OS or do it yourself. I strongly advise you to bind threads to specific cores. This is easy with HPC runtimes/tools like OpenMP (but a pain if your application use low-level threads unless you do not care about platform portability). As for NUMA, you can specify the policy using numactl. More information is provided in this answer.

In practice, HPC applications generally use manual binding so to improve performance. OS scheduler are generally not very good to bind thread automatically efficiently. Few years ago, there was even bugs in the scheduler causing inefficient behaviours: see The Linux Scheduler: a Decade of Wasted Cores. To my knowledge, such problem is not so uncommon in this field and not restricted to Linux. Efficient NUMA-aware OS scheduling is far from being easy.

Upvotes: 2

Related Questions