jorgehumberto
jorgehumberto

Reputation: 1097

Run two sequential bash scripts in parallel 4 times

I have a list of configuration files:

cfg1.cfg
cfg2.cfg
cfg3.cfg
cfg4.cfg
cfg5.cfg
cfg6.cfg
cfg7.cfg
...

that serve as input for two scripts:

script1.sh
script2.sh

which I run sequentially as follows:

script1.sh cfgX.cfg && script2.sh cfgX.cfg

where X=1, 2, 3, ...

These scripts are not parallelised and take a long time to run. How can I launch them in parallel, let's say 4 at the time, so I do not kill the server where I run them?

For just one script I tried a brute force approach similar to:

export COUNTER_LIMIT=4

export COUNTER=1

for each in $(ls *.cfg)
do

INSTRUCTION="./script1.sh $each "

if (($COUNTER >= $COUNTER_LIMIT)) ; 
then
$INSTRUCTION   &&
export COUNTER=$(($COUNTER-$COUNTER_LIMIT));
echo
sleep 600s
else
$INSTRUCTION  &
sleep 5s
fi

echo $COUNTER
export COUNTER=$(($COUNTER+1));

done

(the sleeps are because for some reason the scripts cannot be initiated at the same time...)

So, ho can I do so that the double ampersands in

script1.sh cfgX.cfg && script2.sh cfgX.cfg

dont' block the brute force parallelisation?

I also accept better and simpler approaches ;)

Cheers jorge

UPDATE

I should have mentioned that the config files are not necessarily sequentially named and can have any name, I just made them like this to make the example as simple as possible.

Upvotes: 3

Views: 379

Answers (3)

Jetchisel
Jetchisel

Reputation: 7781

I did some simulation test, first I created the file like you describe.

printf '%s\n' cfg{1..100}.cfg > file.txt

Now the script to process it.

#!/bin/bash

file=file.txt
limit=2

array=()
while read -r cfg; do
  array+=("$cfg")
done < "$file"

for ((n=0; n<limit; n++)); do
  for ((i=n; i<${#array[@]}; i+=limit)); do
    echo script1.sh "${array[i]}" && echo script2.sh "${array[i]}" && sleep 2; echo
  done &
done

wait

Now if you run that script you should see what's going to happen. The echo and sleep is there just for visual aid :-), you can remove them if you decided to actually run the script. Change the value of limit to your own hearts content. The idea and technique howto solve that particular problem did not came from me. It came from this guy. https://github.com/e36freak/, give credit where it is due...

Upvotes: 1

user3666197
user3666197

Reputation: 1

parallel --jobs 4  \
         --load 50% \
         --bar       \
         --eta "( echo 1st-for-{}; echo 2nd-for-{} )" < aListOfAdHocArguments.txt
0% 0:5=0s
1st-for-Abraca
2nd-for-Abraca
20% 1:4=0s                                                                                                                                                                                                 
1st-for-Dabra
2nd-for-Dabra
40% 2:3=0s                                                                                                                                                                                                 
1st-for-Hergot
2nd-for-Hergot
60% 3:2=0s                                                                                                                                                                                                 
1st-for-Fagot
2nd-for-Fagot
80% 4:1=0s                                                                                                                                                                                                 

100% 5:0=0s

Q : How can I launch them in parallel, let's say 4 at the time, so I do not kill the server where I run them?

A lovely task for GNU parallel.

First let's check the localhost ecosystem ( exosystems, executing parallel-jobs over ssh-connected remote-hosts possible, yet exceed the scope of this post ) :

parallel --number-of-cpus
parallel --number-of-cores
parallel --show-limits

For more configuration details beyond the --jobs 4, potentially --memfree or --noswap, --load <max-load> or --keep-order and --results <aFile> or --output-as-files :

man parallel

parallel --jobs 4 \
         --bar     \
         --eta "( script1.sh cfg{}.cfg; script2.sh cfg{}.cfg )" ::: {1..123}

Here,
emulated by a just pair of tandem echo-s for down-counted indexes, so progress-bars are invisible and Estimated-Time-of-Arrival --eta indications are almost instant... :

parallel --jobs 4  \
         --load 50% \
         --bar       \
         --eta "( echo 1st-for-cfg-{}; echo 2nd-for-cfg-{} )" ::: {10..0}
0% 0:11=0s 7                                                                                                                                                                                               
1st-for-cfg-10
2nd-for-cfg-10
9% 1:10=0s 6                                                                                                                                                                                               
1st-for-cfg-9
2nd-for-cfg-9
18% 2:9=0s 5                                                                                                                                                                                               
1st-for-cfg-8
2nd-for-cfg-8
27% 3:8=0s 4                                                                                                                                                                                               
1st-for-cfg-7
2nd-for-cfg-7
36% 4:7=0s 3                                                                                                                                                                                               
1st-for-cfg-6
2nd-for-cfg-6
45% 5:6=0s 2                                                                                                                                                                                               
1st-for-cfg-5
2nd-for-cfg-5
54% 6:5=0s 1                                                                                                                                                                                               
1st-for-cfg-4
2nd-for-cfg-4
63% 7:4=0s 0                                                                                                                                                                                               
1st-for-cfg-3
2nd-for-cfg-3
72% 8:3=0s 0                                                                                                                                                                                               
1st-for-cfg-2
2nd-for-cfg-2
81% 9:2=0s 0                                                                                                                                                                                               
1st-for-cfg-1
2nd-for-cfg-1
90% 10:1=0s 0                                                                                                                                                                                              
1st-for-cfg-0
2nd-for-cfg-0

Update

You added:

I should have mentioned that the config files are not necessarily sequentially named and can have any name, I just made them like this to make the example as simple as possible.

The < list_of_arguments solves this ex-post changed problem definition:

parallel [options] [command [arguments]] < list_of_arguments

Upvotes: 4

larsks
larsks

Reputation: 311298

This would be fairly simple with find and xargs. This would run four processes in parallel, and for any given config file will complete script1.sh before running script2.sh:

find . -name '*.cfg' -print0 | xargs -0 -P 4 -iCFG sh -c 'script1.sh CFG && script2.sh CFG'

Upvotes: 2

Related Questions