Reputation: 1405
I have written a code which has multiple loops parallelized using OpenMP. When a single instance of this code is executed, the run time is quite low. But when I execute multiple instances of the code in parallel, then the run time for each instance is way higher than expected.
Using Zoom, I profiled a single instance (when 70 were running in parallel), and the profile shows that 65% of the time is consumed by OpenMP (please see the image below). Could that be correct?
Upvotes: 2
Views: 489
Reputation: 6537
Looks like you've oversubscribed the machine. Thus OpenMP runtime starts wasting time in spin-loops waiting for the threads which are preempted. Even if the spin-loops are oversubscription-aware (i.e. call to sched_yield()
periodically), it still wastes time for frequent context switches and other overheads.
Upvotes: 1