Reputation: 53
I am using a python script to perform some calculations in my image and save the array obtained into a .png file. I deal with 3000 to 4000 images. To perform all these I use a shell script in Ubuntu. It gets the job done. But is there anyway to make it fast. I have 4 cores in my machine. How to use all of them. The script I am using is below
#!/bin/bash
cd $1
for i in $(ls *.png)
do
python ../tempcalc12.py $i
done
cd ..
tempcalc12.py is my python script
This question might be trivial. But I am really new to programming.
Thank you
Upvotes: 2
Views: 1276
Reputation: 33685
If you have GNU Parallel you can do:
parallel python ../tempcalc12.py ::: *.png
It will do The Right Thing by spawning a job per core, even if the names your PNGs have space, ', or " in them. It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Upvotes: 0
Reputation: 12090
xargs
has --max-procs= ( or -P)
option which does the job in parallel.
The following code does the job in maximum of 4 processes.
ls *.png | xargs -n 1 -P 4 python ../tempcalc12.py
Upvotes: 3
Reputation: 813
You can just add a & to the python line to have everything executed in parallel:
python ../tempcalc12.py $i &
This is a bad idea though, as having too many processes will just slow everything down. What you can do is limit the number of threads, like this:
MAX_THREADS=4
for i in $(ls *.png); do
python ../tempcalc12.py $i &
while [ $( jobs | wc -l ) -ge "$MAX_THREADS" ]; do
sleep 0.1
done
done
Every 100ms, it will check the number of running jobs, and if it is inferior to MAX_THREADS, add new jobs in background.
This is a nice hack if you just want a quick working solution, but you might also want to investigate what GNU Parallel can do.
Upvotes: 1