Reputation: 10247
OpenMP tries to spread out threads across the cores as evenly as possible, but how does that work?
Ultimately, the OS is deciding how to spread them. Does OpenMP simply recommend to the OS to do that (similar to using the likely
macro or register
keyword in C).
If we're running a job with num_threads
threads on a machine with num_cores
cores, none of which are currently in use, is it fair to assume that the threads will be spread out across all cores evenly (and assuming num_threads <= num_cores
, you have pure parallelism), since the OS should be working in our best interest and spreading the load nicely.
I see graphs of strong scaling where the x axis is # cores. Do we then assume that the maximum number of threads they used to run the job is <= the number of cores and that the cores were relatively idle?
Or is all of this a moot point.
Upvotes: 0
Views: 281
Reputation: 9519
The scheduling of the OpenMP threads on the cores and/or hardware threads of the machine is mostly the responsibility of the operating system. It will decide based on its own heuristics where and when to start / stop / migrate them...
However, OpenMP gives you some tools to direct / restrict the span of choices the OS has for taking its decisions. For example, you have access to:
OMP_NUM_THREADS
environment variable, num_threads
clause, omp_set_num_threads()
functionOMP_PLACES
environment variable.OMP_PROC_BIND
environment variable, proc_bind
clause.With that, you have some level of control to steer the OS decisions, but ultimately, it remains in control of the actual scheduling. And the decisions it will take are not always what you would have thought (especially when you don't use placement or binding) since the machine workload and the global scheduling policy it applies might interfere with what you think would have been optimal for your code. For example, on a NUMA (Non-Uniform Memory Access) machine, considerations such as the memory used on the various nodes and which memory segment belongs to which process might prevent from a seemingly even spreading of threads across chips, leading to CPU local contentions...
Upvotes: 1