Reputation: 99
I have 20 files which I want to do 2 operations on each file which takes 30 min each. I wrote a script that takes a file containing a list of the names of the files and iterate on each one in a for loop. I found that if I wrote 2 for loops, one operate on the first half of the data, and the other on the second half of the data, using "&" after done, the time reduced to half. Is this considered parallelism?
The code looks like this:
#!/bin/bash
for i in $(cat $1); do sample+=($i); done
tLen=${#sample[@]}
#loop works on first half of the data
for (( i=0; i<${tLen}/2; i++ ));
do
# operation 1 on ${sample[$i]}
# operation 2 on ${sample[$i]} which is dependent on operation 1
done &
#loop works on second half of the data
for (( i=${tLen}/2; i<${tLen}; i++ ));
do
# operation 1 on ${sample[$i]}
# operation 2 on ${sample[$i]} which is dependent on operation 1
done &
By this the time reduced from 10 hours to 5 hours approximately! Is there a way to determine the number of chunks I want to divide the files to, and run separate for loop to each chunk. for example I give 4 to the script as a parameter, it divides the files into 4 chanks (5 each) and run 4 separate for loops on each quarter in parallel in the background? so the time become 2.5 hours ?
Upvotes: 0
Views: 569
Reputation: 6208
Yes this is parallelism.
Here is an example for n chunks :
#! /bin/bash
for i in $(< "$1"); do sample+=("$i"); done
tLen=${#sample[@]}
nChunks=4
for ((j = 0; j < nChunks; j++)) ; do
for (( i=tLen*j/nChunks; i<tLen*(j+1)/nChunks; i++ )); do
# operation on ${sample[$i]}
done &
done
# Now wait for termination
wait
echo "Done."
Upvotes: 2