Reputation: 1
I use PPL on 2 sockets Windows machine (16C32T x 2 = 64 logical core).
CurrentScheduler->GetNumberOfVirtualProcessors()
reports 64 processors.
But concurrency::parallel_for
use only first socket and total CPU usage never reach 100%.
How to use all sockets (all NUMA nodes) with one parallel_for
?
Upvotes: -3
Views: 39
Reputation: 5703
I think you got it wrong...
The concurrency::parallel_for
function in the PPL uses the system's default scheduler, so it may NOT distribute the workload evenly across all sockets.
So you must create a custom scheduler that explicitly assigns work to each socket. It must be something like this:
#include <ppl.h>
#include <concrt.h>
class CustomScheduler : public Concurrency::Scheduler
{
public:
CustomScheduler()
{
// Number of virtual processors to the total number of logical cores.
SetNumberOfVirtualProcessors(64);
}
virtual void ScheduleTask(Concurrency::TaskProc proc, void* param)
{
int socketIndex = GetCurrentVirtualProcessor()->GetNodeId();
Concurrency::Task::CreateAndStart([=]() {
proc(param);
}, GetVirtualProcessor(socketIndex));
}
};
int main()
{
CustomScheduler scheduler;
Concurrency::Scheduler::SetDefaultScheduler(&scheduler);
concurrency::parallel_for(0, 100, [](int i) {
// Your parallel code here.
});
return 0;
}
It's just a concept; I did NOT tested yet.
Upvotes: 0