Reputation: 189
I have a bash script which acts as a wrapper for an analysis pipeline. If the script errors out I want to be able to run the script from the point at which the errors occurred by simply re-running the original command. I have set two different traps; one which will remove the last file being generated on a non-zero exit from my script, the other will remove all the temporary files on exit signal = 0 and essentially cleans up the file system at the end of the run. I turned on noclobber in the bash environment which allows my script to skip over lines of the script where files have already been written but this will only do this if I do not set the non-zero exit trap. As soon as I set this trap then it will exit at the first line where noclobber IDs a file it will not overwrite. Is there a way for me to skip over lines of code that have successfully run previously rather than having to re-run my code from the start? I know I could use conditional statements for each line but I thought there might be a neater way of doing this.
set -o noclobber
# Function to clean up temporary folders when script exits at the end
rmfile() { rm -r $1 }
# Function to remove the file being currently generated
# Function executed if script errors out
rmlast() {
if [ ! -z "$CURRENTFILE" ]
then
rm -r $1
exit 1
fi }
# Trap to remove the currently generated file
trap 'rmlast "$CURRENTFILE"' ERR SIGINT
#Make temporary directory if it has not been created in a previous run
TEMPDIR=$(find . -name "tmp*")
if [ -z "$TEMPDIR" ]
then
TEMPDIR=$(mktemp -d /test/tmpXXX)
fi
# Set CURRENTFILE variable
CURRENTFILE="${TEMPDIR}/Variants.vcf"
# Set CURRENTFILE variable
complexanalysis_tool input_file > $CURRENTFILE
# Set CURRENTFILE variable
CURRENTFILE="${TEMPDIR}/Filtered.vcf"
complexanalysis_tool2 input_file2 > $CURRENTFILE
CURRENTFILE="${TEMPDIR}/Filtered_2.vcf"
complexanalysis_tool3 input_file3 > $CURRENTFILE
# Move files to final destination folder
mv -nv $TEMPDIR/*.vcf /test/newdest/
# Trap to remove temporary folders when script finishes running
trap 'rmfile "$TEMPDIR"' 0
Update:
I have been offered answers suggesting the use of the make utility. I want to make use of its inbuilt utility to check if a dependency has been fulfilled. In my hands the makefile suggested by VK Kashyap does not seem to skip execution for previously accomplished tasks. So for example I ran the script above and interrupted the script when it was running filtered.vcf with ctrl c. When I rerun the script again it runs from the beginning again i.e. starts again at varaints.vcf. Am I missing something in order to get the makefile to show sources as being fullfilled?
Answer to update:
OK this is a rookie mistake but since I am not familiar with generating makefiles I will post this explanation of my error. The reason my makefile was not rerunning from the exit point was that I had named the targets a different name to the output files being generated. So as VK Kashyap quite correctly answered if you name the targets eg.
variants.vcf
filtered.vcf
filtered2.vcf
the same as the output files being generated then the script will skip previously accomplished tasks.
Upvotes: 1
Views: 427
Reputation: 31284
there is no built-in way to do that.
however, you could brew something like that by keeping track of the last successful line and building your own goto
statement, as described here and in Is there a "goto" statement in bash? (just replace the 'labels' with actual line-numbers).
however, the question is whether this is really a smart idea.
a better way is to only run the commands needed, not the commands not-yet-executed. this could be done either by explicit conditionals in your bash-script:
produce_if_missing() {
# check if first argument is existing
# if not run the rest of the arguments and pipe it into the first one
local curfile=$1
shift
if [ ! -e "${curfile}" ]; then
$@ > "${curfile}"
fi
}
produce_if_missing Variants.vcf complexanalysis_tool input_file
produce_if_missing Filtered.vcf complexanalysis_tool2 input_file2
or using tools that are made for such things (see VK Kahyap's answer using make
, though i prefer using variables in the make-rules to minimize typos):
Variants.vcf: input_file
complexanalysis_tool $^ > $@
Filtered.vcf: input_file
complexanalysis_tool2 $^ > $@
Upvotes: 2
Reputation: 168
make utility might be an answer for the thing you want to achive.
it has inbuilt dependecy checking (the stuff which you are trying to achive with tmp files)
#run all target when all of the files are available
all: variants.vcf filtered.vcf filtered2.vcf
mv -nv $(TEMPDIR)/*.vcf /test/newdest/
variants.vcf:
complexanalysis_tool input_file > variants.vcf
filtered.vcf:
complexanalysis_tool2 input_file2 > filtered.vcf
filtered2.vcf:
complexanalysis_tool3 input_file3 > filtered2.vcf
you may use bash script to invoke this make file as:
#/bin/bash
export TEMPDIR=xyz
make -C $TEMPDIR all
make utility will check itself for already accomplished task and skip execution for done stuffs. it will continue where you had the error finishing the task.
you can find more details on internet about exact syntax for makefile.
Upvotes: 3