his name
his name

Reputation: 23

Running shell script loop in parallel

I wrote an shell script which

  1. get list of all image files from directory
  2. create new folder if needed for new image
  3. optimize image in order to save storage resources

I've tried to use parallel -j "$(nproc)" before mogrify but found that it was wrong, because before mogrify is used DIR and mkdir, i need instead something like & at end of mogrify but to do it only for n processes.

the current code look like:

#!/bin/bash

find $1 -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" -o -iname "*.gif" -type f | while read IMAGE
do
    DIR="$2"/`dirname $IMAGE`
    echo "$IMAGE > $DIR"
    mkdir -p $DIR
    mogrify -path "$DIR" -resize "6000000@>" -filter Triangle -define filter:support=2 -unsharp 0.25x0.08+8.3+0.045 -dither None -posterize 136 -quality 82 -define jpeg:fancy-upsampling=off -define png:compression-filter=5 -define png:compression-level=9 -define png:compression-strategy=1 -define png:exclude-chunk=all -interlace none -colorspace sRGB "$IMAGE"
done

exit 0

Can someone suggest what will be the right way to run such script in parallel? as each run take about 15 seconds.

Upvotes: 2

Views: 2776

Answers (2)

Ole Tange
Ole Tange

Reputation: 33685

Make a bash function that deals correctly with one file and call that in parallel:

#!/bin/bash

doit() {
  IMAGE="$1"
  DIR="$2"/`dirname $IMAGE`
  echo "$IMAGE > $DIR"
  mkdir -p $DIR
  mogrify -path "$DIR" -resize "6000000@>" -filter Triangle -define filter:support=2 -unsharp 0.25x0.08+8.3+0.045 -dither None -posterize 136 -quality 82 -define jpeg:fancy-upsampling=off -define png:compression-filter=5 -define png:compression-level=9 -define png:compression-strategy=1 -define png:exclude-chunk=all -interlace none -colorspace sRGB "$IMAGE"
}
export -f doit

find $1 -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" -o -iname "*.gif" -type f |
    parallel doit

Default for GNU Parallel is to run one job per CPU-thread, so ǹproc is not needed.

This has less overhead than starting sem for each file (sem = 0.2 sec per call, parallel = 7 ms per call).

Upvotes: 0

that other guy
that other guy

Reputation: 123450

When you have a shell loop that does some setup and invokes an expensive command, the way to parallelize it is to use sem from GNU parallel:

for i in {1..10}
do
  echo "Doing some stuff"
  sem -j +0 sleep 2
done
sem --wait

This allows the loop to run and do its thing as normal, while also scheduling the commands to run in parallel (-j +0 runs one job per CPU core).

Upvotes: 2

Related Questions