Reputation: 4328
I'm writing a tiny script that calls the "PNGOUT" util on a few hundred PNG files. I simply did this:
find $BASEDIR -iname "*png" -exec pngout {} \;
And then I looked at my CPU monitor and noticed only one of the core was used, which is quite sad.
In this day and age of dual, quad, octo and hexa (?) cores desktop, how do I simply parallelize this task with Bash? (it's not the first time I've had such a need, for quite a lot of these utils are mono-threaded... I already had the case with mp3 encoders).
Would simply running all the pngout in the background do? How would my find command look like then? (I'm not too sure how to mix find and the '&' character)
I if have three hundreds pictures, this would mean swapping between three hundreds processes, which doesn't seem great anyway!?
Or should I copy my three hundreds files or so in "nb dirs", where "nb dirs" would be the number of cores, then run concurrently "nb finds"? (which would be close enough)
But how would I do this?
Upvotes: 13
Views: 4054
Reputation: 4328
Answering my own question... It turns out there's a relatively unknown feature of the xargs command that can be used to accomplish that:
find . -iname "*png" -print0 | xargs -0 --max-procs=4 -n 1 pngout
Bingo, instant 4x speedup on a quad-cores machine :)
Upvotes: 22
Reputation: 62593
to spawn all tasks in the background:
find $BASEDIR -iname "*png" | while read f; do
pngout "$f" &
done
but of course that isn't the best option. to do 'n' tasks at a time:
i=0
find $BASEDIR -iname "*png" | while read f; do
pngout "$f" &
i=$((i+1))
if [[ $i -gt $NTASKS ]]; then
wait
i=0
fi
done
it's not optimal, since it waits until all the concurrent tasks are finished to start another group; but it should be better than nothing.
Upvotes: 4
Reputation: 7576
Parallellization is rarely trivial. In your case if you can select files uniquely in equal sized sets, then you can run multiple copies of your find script. You don't want to fire up 300 pictures in the background. For jobs like this it is usually faster to run them sequentially. Backgrounding the command or using batch are both viable options.
Assuming the files are consecutively numbered you could use a find pattern like "[0-4].png" for one find and "[5-9].png" on another. This would keep two cores running for roughly the same amount of time.
Farming task out would involve a dispatcher-runner setup. Building, testing, and running this would take quite a while.
Fire up BOINC to use those spare processesors. You will likely want to ignore niced processes when monitoring cpu frequency. Add code like this to rc.local.
for CPU in /sys/devices/system/cpu/cpu[0-9]*; do echo 1 > ${CPU}/cpufreq/ondemand/ignore_nice_load done
Upvotes: 2