Reputation: 115
We'd like to interpret tons of coordinates and do something with them using multiple workers. What we got:
coords.txt
100, 100, 100
244, 433, 233
553, 212, 432
776, 332, 223
...
8887887, 5545554, 2243234
worker.sh
coord_reader='^([0-9]+), ([0-9]+), ([0-9]+)$'
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ $coord_reader ]]; then
x=${BASH_REMATCH[1]}
y=${BASH_REMATCH[2]}
z=${BASH_REMATCH[3]}
echo "x is $x, y is $y, z is $z"
fi
done < "$1"
To execute worker.sh we call bash worker.sh coords.txt
Bc we have an amount of millions of coordinates it's needed to split the coords.txt and create multiple workers doing the same task, like coordsaa, coordsab, coordsac
each 1 worker.
So we split coords.txt
using split
.
split -l 1000 coords.txt coords
But, how to assign one file per worker?
I am new to stackoverflow, feel free to comment so I can improve my asking skills.
Upvotes: 2
Views: 1393
Reputation: 207445
I would do this with GNU Parallel. Say you want 8 workers running at a time till all the processing is done:
parallel -j 8 --pipepart -a coords.txt --fifo bash worker.sh {}
where:
{}
to pass to your worker script"Upvotes: 3
Reputation: 3603
To run workers from bash to treat a lot of files:
Files architecture:
files/ runner.sh worker.sh
files/
: it is a folder with a lot a files (for example 1000)
runner.sh
: launch a lot a worker
worker.sh file
: task to treat a file
For example:
worker.sh:
#!/usr/bin/env bash
sleep 5
echo $1
To run all files in files/
one per worker do:
runner.sh:
#!/usr/bin/env bash
n_processes=$(find files/ -type f | wc -l)
echo "spawning ${n_processes}"
for file in $(find . -type f); then
bash worker.sh "${file}" &
done
wait
/!\ 1000 processes is a lot !!
It is better to create a "pool of processes" (here it guarantees only a number maximum of process running at the same time, an old child process is not reused for a new task but died when its task is done or failed) :
#!/usr/bin/env bash
n_processes=8
echo "max of processes: ${n_processes}"
for file in $(find files/ -type f); do
while [[ $(jobs -r | wc -l) -gt ${n_processes} ]]; do
:
done
bash worker.sh "${file}" &
echo "process pid: $! finished"
done
wait
It is not really a pool of processes but it avoids having a lot of processes at the same time alive, number maximum of processes alive at the same time is given by n_processes
.
Execute bash runner.sh
.
Upvotes: 4