Bioinfoguy
Bioinfoguy

Reputation: 99

Running 2 loops in parallel in bash scripting

I have 20 files which I want to do 2 operations on each file which takes 30 min each. I wrote a script that takes a file containing a list of the names of the files and iterate on each one in a for loop. I found that if I wrote 2 for loops, one operate on the first half of the data, and the other on the second half of the data, using "&" after done, the time reduced to half. Is this considered parallelism?

The code looks like this:

 #!/bin/bash

    for i in $(cat $1); do sample+=($i); done
    tLen=${#sample[@]}

    #loop works on first half of the data

    for (( i=0; i<${tLen}/2; i++ ));
    do
        # operation 1 on ${sample[$i]}
        # operation 2 on ${sample[$i]} which is dependent on operation 1
    done &

   #loop works on second half of the data

    for (( i=${tLen}/2; i<${tLen}; i++ ));
    do
        # operation 1 on ${sample[$i]}
        # operation 2 on ${sample[$i]} which is dependent on operation 1
    done &

By this the time reduced from 10 hours to 5 hours approximately! Is there a way to determine the number of chunks I want to divide the files to, and run separate for loop to each chunk. for example I give 4 to the script as a parameter, it divides the files into 4 chanks (5 each) and run 4 separate for loops on each quarter in parallel in the background? so the time become 2.5 hours ?

Upvotes: 0

Views: 569

Answers (1)

Edouard Thiel
Edouard Thiel

Reputation: 6208

Yes this is parallelism.

Here is an example for n chunks :

#! /bin/bash

for i in $(< "$1"); do sample+=("$i"); done
tLen=${#sample[@]}
nChunks=4

for ((j = 0; j < nChunks; j++)) ; do
    for (( i=tLen*j/nChunks; i<tLen*(j+1)/nChunks; i++ )); do
        # operation on ${sample[$i]}
    done &
done

# Now wait for termination
wait
echo "Done."

Upvotes: 2

Related Questions