Niroop
Niroop

Reputation: 53

Multi-threading using shell script

I am using a python script to perform some calculations in my image and save the array obtained into a .png file. I deal with 3000 to 4000 images. To perform all these I use a shell script in Ubuntu. It gets the job done. But is there anyway to make it fast. I have 4 cores in my machine. How to use all of them. The script I am using is below

#!/bin/bash
cd $1
for i in $(ls *.png)
do
python ../tempcalc12.py $i
done
cd ..

tempcalc12.py is my python script

This question might be trivial. But I am really new to programming.

Thank you

Upvotes: 2

Views: 1276

Answers (3)

Ole Tange
Ole Tange

Reputation: 33685

If you have GNU Parallel you can do:

parallel python ../tempcalc12.py ::: *.png

It will do The Right Thing by spawning a job per core, even if the names your PNGs have space, ', or " in them. It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Upvotes: 0

ymonad
ymonad

Reputation: 12090

xargs has --max-procs= ( or -P) option which does the job in parallel.
The following code does the job in maximum of 4 processes.

ls *.png |  xargs -n 1 -P 4 python ../tempcalc12.py

Upvotes: 3

Antoine Pietri
Antoine Pietri

Reputation: 813

You can just add a & to the python line to have everything executed in parallel:

python ../tempcalc12.py $i &

This is a bad idea though, as having too many processes will just slow everything down. What you can do is limit the number of threads, like this:

MAX_THREADS=4
for i in $(ls *.png); do
    python ../tempcalc12.py $i &
    while [ $( jobs | wc -l ) -ge "$MAX_THREADS" ]; do
        sleep 0.1
    done
done

Every 100ms, it will check the number of running jobs, and if it is inferior to MAX_THREADS, add new jobs in background.

This is a nice hack if you just want a quick working solution, but you might also want to investigate what GNU Parallel can do.

Upvotes: 1

Related Questions