Garrett
Garrett

Reputation: 11725

Tailing multiple files in the background with bash

I've never written anything this intense in bash. Basically, I want to run a limited number of data import scripts in parallel. To do so, I need to know when one has terminated in order to start the next. However, I'm not sure how to do this in parallel. The following works synchronously:

# watch the outputfile for "DONE!"
tail -f $outputfile | while read OUTPUT
do
  if [[ "${OUTPUT}" == *"DONE!"* ]]
  then
    runNextScript
  fi
done

How can I run this asynchronously?

Upvotes: 1

Views: 79

Answers (1)

Sylvain Leroux
Sylvain Leroux

Reputation: 51990

Basically, I want to run a limited number of data import scripts in parallel. To do so, I need to know when one has terminated in order to start the next.

One way of doing that is to create a fifo containing as much tokens as the maximum number of concurrent scripts.

Then, before launching a task, you first consume a token, actually launch the task, and finally put back the token in the fifo. That way, when the maximum number of working script is reached, the next one is blocked until a token is available.

Not clear? Here is a proof of concept (you definitively have to adapt to your needs!):

  • master.sh
#!/bin/bash

rm -f fifo
mkfifo fifo

exec 3<>fifo

# Simulate 26 tasks
tasks=$(exec echo {a..z})

#insert 5 tokens in the fifo
#that is at max 5 worker working at the same time
for i in {1..5}; do
    (echo T >&3; echo Insert token) &
done

# launch the tasks when a token is available
for i in $tasks; do
    read <&3
    ( ./worker.sh $i; echo T >&3 ) &
done

wait
  • worker.sh (not much of interest: simulate doing some stuff)

#!/bin/bash

# simulate doing some stuff
S=$(( RANDOM % 10 ))
echo "$(exec date +%s) PID$$ doing task $1 for $S"
sleep $S

Here is a transcript of a session:

sh$ ./master.sh 
Insert token
Insert token
Insert token
Insert token
Insert token
1405456428 PID3039 doing task a for 0
1405456428 PID3041 doing task b for 0
1405456428 PID3046 doing task e for 5
1405456428 PID3043 doing task c for 5
1405456428 PID3045 doing task d for 8
1405456428 PID3055 doing task f for 4
1405456428 PID3057 doing task g for 0
1405456428 PID3066 doing task h for 6
1405456432 PID3070 doing task i for 2
1405456433 PID3074 doing task j for 3
1405456433 PID3077 doing task k for 0
1405456433 PID3082 doing task l for 9
1405456434 PID3086 doing task m for 3
1405456434 PID3089 doing task n for 5
1405456436 PID3094 doing task o for 7
1405456436 PID3097 doing task p for 7
1405456437 PID3102 doing task q for 2
1405456439 PID3106 doing task r for 3
1405456439 PID3109 doing task s for 3
1405456442 PID3114 doing task t for 7
1405456442 PID3118 doing task u for 5
1405456442 PID3121 doing task v for 7
1405456443 PID3126 doing task w for 9
1405456443 PID3129 doing task x for 3
1405456446 PID3134 doing task y for 9
1405456447 PID3138 doing task z for 1

The total execution time is around 20s, when the total "worked time" by the workers is 113s. If I'm not too wrong, that factor 5 is corresponding to the 5 workers working in parallel.

Upvotes: 1

Related Questions