Reputation: 1097
I have a list of configuration files:
cfg1.cfg
cfg2.cfg
cfg3.cfg
cfg4.cfg
cfg5.cfg
cfg6.cfg
cfg7.cfg
...
that serve as input for two scripts:
script1.sh
script2.sh
which I run sequentially as follows:
script1.sh cfgX.cfg && script2.sh cfgX.cfg
where X=1, 2, 3, ...
These scripts are not parallelised and take a long time to run. How can I launch them in parallel, let's say 4 at the time, so I do not kill the server where I run them?
For just one script I tried a brute force approach similar to:
export COUNTER_LIMIT=4
export COUNTER=1
for each in $(ls *.cfg)
do
INSTRUCTION="./script1.sh $each "
if (($COUNTER >= $COUNTER_LIMIT)) ;
then
$INSTRUCTION &&
export COUNTER=$(($COUNTER-$COUNTER_LIMIT));
echo
sleep 600s
else
$INSTRUCTION &
sleep 5s
fi
echo $COUNTER
export COUNTER=$(($COUNTER+1));
done
(the sleeps are because for some reason the scripts cannot be initiated at the same time...)
So, ho can I do so that the double ampersands in
script1.sh cfgX.cfg && script2.sh cfgX.cfg
dont' block the brute force parallelisation?
I also accept better and simpler approaches ;)
Cheers jorge
UPDATE
I should have mentioned that the config files are not necessarily sequentially named and can have any name, I just made them like this to make the example as simple as possible.
Upvotes: 3
Views: 379
Reputation: 7781
I did some simulation test, first I created the file like you describe.
printf '%s\n' cfg{1..100}.cfg > file.txt
Now the script to process it.
#!/bin/bash
file=file.txt
limit=2
array=()
while read -r cfg; do
array+=("$cfg")
done < "$file"
for ((n=0; n<limit; n++)); do
for ((i=n; i<${#array[@]}; i+=limit)); do
echo script1.sh "${array[i]}" && echo script2.sh "${array[i]}" && sleep 2; echo
done &
done
wait
Now if you run that script you should see what's going to happen. The echo and sleep is there just for visual aid :-), you can remove them if you decided to actually run the script. Change the value of limit to your own hearts content. The idea and technique howto solve that particular problem did not came from me. It came from this guy. https://github.com/e36freak/, give credit where it is due...
Upvotes: 1
Reputation: 1
parallel --jobs 4 \
--load 50% \
--bar \
--eta "( echo 1st-for-{}; echo 2nd-for-{} )" < aListOfAdHocArguments.txt
0% 0:5=0s
1st-for-Abraca
2nd-for-Abraca
20% 1:4=0s
1st-for-Dabra
2nd-for-Dabra
40% 2:3=0s
1st-for-Hergot
2nd-for-Hergot
60% 3:2=0s
1st-for-Fagot
2nd-for-Fagot
80% 4:1=0s
100% 5:0=0s
Q : How can I launch them in parallel, let's say 4 at the time, so I do not kill the server where I run them?
A lovely task for GNU parallel
.
First let's check the localhost ecosystem ( exosystems, executing parallel
-jobs over ssh
-connected remote-hosts possible, yet exceed the scope of this post ) :
parallel --number-of-cpus
parallel --number-of-cores
parallel --show-limits
For more configuration details beyond the --jobs 4
, potentially --memfree
or --noswap
, --load <max-load>
or --keep-order
and --results <aFile>
or --output-as-files
:
man parallel
parallel --jobs 4 \
--bar \
--eta "( script1.sh cfg{}.cfg; script2.sh cfg{}.cfg )" ::: {1..123}
Here,
emulated by a just pair of tandem echo
-s for down-counted indexes, so progress-bars are invisible and Estimated-Time-of-Arrival --eta
indications are almost instant... :
parallel --jobs 4 \
--load 50% \
--bar \
--eta "( echo 1st-for-cfg-{}; echo 2nd-for-cfg-{} )" ::: {10..0}
0% 0:11=0s 7
1st-for-cfg-10
2nd-for-cfg-10
9% 1:10=0s 6
1st-for-cfg-9
2nd-for-cfg-9
18% 2:9=0s 5
1st-for-cfg-8
2nd-for-cfg-8
27% 3:8=0s 4
1st-for-cfg-7
2nd-for-cfg-7
36% 4:7=0s 3
1st-for-cfg-6
2nd-for-cfg-6
45% 5:6=0s 2
1st-for-cfg-5
2nd-for-cfg-5
54% 6:5=0s 1
1st-for-cfg-4
2nd-for-cfg-4
63% 7:4=0s 0
1st-for-cfg-3
2nd-for-cfg-3
72% 8:3=0s 0
1st-for-cfg-2
2nd-for-cfg-2
81% 9:2=0s 0
1st-for-cfg-1
2nd-for-cfg-1
90% 10:1=0s 0
1st-for-cfg-0
2nd-for-cfg-0
You added:
I should have mentioned that the config files are not necessarily sequentially named and can have any name, I just made them like this to make the example as simple as possible.
The < list_of_arguments
solves this ex-post changed problem definition:
parallel [options] [command [arguments]] < list_of_arguments
Upvotes: 4
Reputation: 311298
This would be fairly simple with find
and xargs
. This would run four processes in parallel, and for any given config file will complete script1.sh
before running script2.sh
:
find . -name '*.cfg' -print0 | xargs -0 -P 4 -iCFG sh -c 'script1.sh CFG && script2.sh CFG'
Upvotes: 2