Hielke Walinga
Hielke Walinga

Reputation: 2845

GNU Parallel not using all processors

I am using GNU parallel to speed up a process. However, GNU parallel does not use all cores on my machine. I wonder what the limiting factor here is.

The command:

find data -type f | parallel --pipe -P 70 python program.py > output 

However, it only uses 4 of 70 cores. I wonder if anybody knows if there are other limitations that make it only use 4 cores.

Upvotes: 1

Views: 1319

Answers (1)

Ole Tange
Ole Tange

Reputation: 33685

I do not know that program.py does. But it is very uncommon to use --pipe together with find. So I think this is what you want:

find data -type f | parallel -P 70 python program.py > output 

With --pipe the output from find must be atleast 70 MB for this to run 70 jobs in parallel because the default --block-size is 1 MB:

find data -type f | parallel --pipe -P 70 python program.py > output 

IF program.py really reads filenames on stdin, then you should probably use --round-robin with a smaller --block:

find data -type f | parallel --pipe --block 1k --round-robin -P 70 python program.py > output

This will takes the input from find and give the first 1kByte to the first job, the 70th kByte to the 70th job, and the 71st kByte to the first job.

Upvotes: 2

Related Questions