Reputation: 7300
I've got this simple script below to stream compressed MySQL dumps to Amazon S3 bucket in parallel:
#!/bin/bash
COMMIT_COUNT=0
COMMIT_LIMIT=2
for i in $(cat list.txt); do
echo "$i "
mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
COMMIT_COUNT=0
wait
fi
done
if [ ${COMMIT_COUNT} -gt 0 ]; then
wait
fi
The output looks like this:
database1
database2
duration: 2.311823213s
duration: 2.317370326s
Is there a way to print this on one line for each dump?
database1 - duration: 2.311823213s
database2 - duration: 2.317370326s
The echo -n
switch doesn't help in this case.
EDIT: Wed May 6 15:17:29 BST 2015
I was able to achieve expected results based on accepted answer:
echo "$i -" $(mysqldump -B $i| bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 2>&1) &
- however a command that is running in a subshell is not returning exit status to a parent shell because it's running in parallel so I'm not able to verify if it succeed or failed.
Upvotes: 20
Views: 8511
Reputation: 3265
You try to do parallelization with your script. I'd recommend not to re-invent the wheel but to use a tried and tested tool: GNU parallel. The tutorial is huge: http://www.gnu.org/software/parallel/parallel_tutorial.html
It has a different options for jobs that return with exit value != 0: abort on the first error or continue working till the end.
One of the advantages of GNU parallel to the script of the OP is that it immediately starts the third job as soon as the first one is finished.
Upvotes: 0
Reputation: 51
I would make separate function to control all the process and then run this function in background instead of running mysqldump itself.
By doing this you will have several processes running simultaneously and at the same time you'll have control over mysqldump as it was run synchronously
#!/bin/bash
do_job(){
param=$1
echo job $param started... >&2 # Output to stderr as stdout is grabbed
sleep $[$RANDOM/5000]
echo $RANDOM # Make some output
[ $RANDOM -ge 16383 ] # Generate exit code
}
control_job() {
param=$1
output=`do_job $param`
exit_code=$?
echo $1 printed $output and exited with $exit_code
}
JOBS_COUNT=0
JOBS_LIMIT=2
for i in database1 database2 database3 database4; do
control_job $i &
(( JOBS_COUNT++ ))
if [ $JOBS_COUNT -ge $JOBS_LIMIT ]; then
(( JOBS_COUNT-- ))
wait -n 1 # wait for one process to exit
fi
done
wait # wait for all processes running
Here do_job
is used in place of your mysqldump pipline.
BTW, there's a small improvement here. You probably do not want to wait for all spawned processes when you've reached the limit. It will be enough to wait for arbitrary one. That's what wait -n 1
does
Upvotes: 0
Reputation: 159
Expanding on your answer, to exit the script immediately upon failure, you have to save the pids of the background processes in an array. In your while loop add pids[COMMIT_COUNT]=$!
after the mysqldump
command.
Then you could write a function to loop over all these pids, and exit if one of them failed:
wait_jobs() {
for pid in "${pids[@]}"; do
wait ${pid}
if [ $status -ne 0 ]; then
echo "ERROR: Backups failed"
exit 1
fi
done
}
Call this function instead of wait $(jobs -p)
in the script.
You can replace the pids array with jobs -p
in the for loop, but then you will not get the pids of jobs that completed before the call to the loop.
The wait_jobs()
function above cannot be used in a subshell, the exit 1
call will only terminate the subshell then.
The complete script:
#!/bin/bash
COMMIT_COUNT=0
COMMIT_LIMIT=2
wait_jobs() {
for pid in "${pids[@]}"; do
wait ${pid}
if [ $status -ne 0 ]; then
echo "ERROR: Backups failed"
exit 1
fi
done
}
while read -r i; do
mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 |& xargs -I{} echo "${DB} - {}" &
# save the pid of the background job so we can get the
# exit status with wait $pid later
pids[COMMIT_COUNT]=$!
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
COMMIT_COUNT=0
wait_jobs
fi
done < list.txt
wait_jobs
Upvotes: 3
Reputation: 7300
Thanks for all your help but I think I've finally found an optimal solution for this.
Basically I used xargs
to format the output so each entry (dump name + duration time) is on one line. I also added the job spec to wait
command to get the exit status:
wait [n ...] Wait for each specified process and return its termination status. Each n may be a process ID or a job specification; if a job spec is given, all processes in that job's pipeline are waited for. If n is not given, all currently active child processes are waited for, and the return status is zero. If n specifies a non-existent process or job, the return status is 127. Otherwise, the return status is the exit status of the last process or job waited for.
Test:
# sh -c 'sleep 5; exit 1' &
[1] 29970
# wait; echo $?
0
# sh -c 'sleep 5; exit 1' &
[1] 29972
# wait $(jobs -p); echo $?
1
Final script:
#!/bin/bash
COMMIT_COUNT=0
COMMIT_LIMIT=2
while read -r i; do
mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2 |& xargs -I{} echo "${DB} - {}" &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
COMMIT_COUNT=0
wait $(jobs -p)
fi
done < list.txt
if [ ${COMMIT_COUNT} -gt 0 ]; then
wait $(jobs -p)
fi
if [ $? -ne 0 ]; then
echo "ERROR: Backups failed"
exit 1
fi
Upvotes: 5
Reputation: 1923
Regarding your additional question about exit status, let me write another answer. Because $()
will run a subshell, I don't think it is possible to return the exit status to the main shell like normal command would. But it is possible to write the exit status to a file to be examined later. Please try command below. It will create file called status-$i.txt
containing two lines. One is for mysqldump
, the other for gof3r
.
e="status-$i.txt"
echo -n > $e
echo "$i -" $( \
( mysqldump -B $i 2>&1; echo m=$? >> $e ) \
| bzip2 -zc \
| ( gof3r put -b s3bucket -k $i.sql.bz2 2>&1; echo g=$? >> $e ) \
) &
You may also need to clean-up all status-*.txt
files at the start of your script.
Upvotes: 1
Reputation: 500
untested, etc.
#!/bin/sh
COMMIT_COUNT=0
COMMIT_LIMIT=2
_dump() {
# better use gzip or xz. There's no benefit using bzip2 afaict
output="$(mysqldump -B "$1" | bzip2 -zc | gof3r put -b s3bucket -k "$1.sql.bz2" 2>&1)"
[ "$?" != 0 ] && output="failed"
printf "%s - %s\n" "$1" "$output"
}
while read i; do
_dump "$i" &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]; then
COMMIT_COUNT=0
wait
fi
done < list.txt
wait
Upvotes: -3
Reputation: 1923
I think this command will do what you want:
echo "$i -" `(mysqldump -B $i | bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2) 2>&1` &
Or, use $()
in place of backticks :
echo "$i -" $( (mysqldump -B $i| bzip2 -zc | gof3r put -b s3bucket -k $i.sql.bz2) 2>&1 ) &
The echo
command will wait for mysqldump ..
result to finish before try to print together with $i
. The sub-shell ( … )
and error redirection 2>&1
ensure that error messages go into the echoed output too. The space after the $(
is necessary because $((
without a space is a different special operation — an arithmetic expansion.
Upvotes: 7