Reputation: 31
I have a matrix spread among four NUMA-node local memories. Now I want to open 4 threads, each one on a CPU corresponding to a different NUMA-node, so that each thread can access its part of the matrix as fast as possible. OpenMP has the "proc_bind(spread)" option, but it puts the threads on the same NUMA-node, but on far apart CPUs.
How can I force the threads to bind to different NUMA nodes?
Or, if that is not possible: When I use all cores on all nodes (256 threads total), I know how to get the ID of the NUMA node, but I can't control which thread gets which indices e.g. in a for loop. How could I distribute my workload efficiently with respect to the NUMA configuration?
Upvotes: 3
Views: 1085
Reputation: 9519
Here is what I'd do:
numactl -H
OMP_PLACES
to bind the threads to these cores: export OMP_PLACES="{0},{1},{2},{3}"
numactl -l myBinary
For what I understood of your question, that should work.
Upvotes: 3