Reputation: 3022
The bash
below is almost complete. The only part I am struggling with is that in the process.log
if the string The bam file is corrupted and has been removed, please check log for reason.
is found then the corresponding .bam
($f
) in the bash
, is removed. I added:
echo "The bam file is corrupted and has been removed, please check log for reason."
[[ -f "$f" ]] && rm -f "$f"
in an attempt to do this, but it looks like it is removing the last .bam
regardless (in the process.log NA19240.bam (that file has the search string in it), but it was not. Instead the last .bam
(NS12911
) in the process.log is (even though the search string is not there). I am not able to fix this and need some expert help. I apologize for the lengthy post, just trying to add all the details. Thank you :).
bash
logfile=/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log
for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
echo "Start bam validation creation: $(date) - File: $f"
bname=`basename $f`
pref=${bname%%.bam}
bam validate --in $f --verbose 2> /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt
echo "End bam validation creation: $(date) - File: $f"
done >> "$logfile"
for file in /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/*.txt ; do
echo "Start verifying $(date) - File: $file"
bname=`basename $file`
if $(grep -iq "(SUCCESS)" "${file}"); then
echo "The verification of the bam file has completed sucessfully."
else
echo "The bam file is corrupted and has been removed, please check log for reason."
[[ -f "$f" ]] && rm -f "$f"
echo "End of bam file verification: $(date) - File: ${file}"
fi
done >> "$logfile"
process.log
Start bam validation creation: Fri May 6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
End bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
Start bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
End bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
Start bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
End bam validation creation: Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
Start verifying Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
Start verifying Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
The bam file is corrupted and has been removed, please check log for reason.
End of bam file verification: Fri May 6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
Start verifying Fri May 6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May 6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt
Upvotes: 0
Views: 54
Reputation: 4822
It is is a little hard for me to completely replicate your environment so I am having to make some assumptions about your setup and exactly what your constraints are. I see many ways the process could be simplified or made more efficient but instead of introducing to many unneeded changes, I mainly focused on making the script work.
With that said, I did rearranged the processing to where each ${pref}_validation.txt
is verified right after it is created.
Can you try the following (note: Updated the script. First time around I got going a little too fast and copied the wrong version.) and let me know what the result is:
#!/bin/bash
logfile="/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log"
for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
echo "Start bam validation creation: $(date) - File: $f"
bname="$(basename "$f")"
pref="${bname%%.bam}"
bam validate --in "$f" --verbose 2> "/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"
echo "End bam validation creation: $(date) - File: $f"
file="/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"
echo "Start verifying $(date) - File: $file"
if grep -iq "(SUCCESS)" "${file}"; then
echo "The verification of the bam file has completed sucessfully."
else
if [[ -f "$f" ]]; then
rm -f "$f"
echo "The bam file is corrupted and has been removed, please check log for reason."
fi
fi
echo "End of bam file verification: $(date) - File: ${file}"
done >> "$logfile"
Hopefully combining the two steps in the one for loop does not deviate from some process requirement you have. What I find helpful about it is that it allows for a more streamlined code flow and the log file should now read like:
Start bam validation creation: Fri May 6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
End bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
Start verifying Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
Start bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
...
Highly Modified Version
I took a try at the highly streamlined and more resilient version of the script. I would be interested if you could check this one as well:
#!/bin/bash
# basepath allows you to quickly move the script by updating this path
basepath="/home/cmccabe/Desktop/NGS/API/5-4-2016"
# give the logfile a name
logfile="${basepath}/process.log"
# for each .bam file in basepath do
for f in ${basepath}/*.bam ; do
# validate the file with the bam command
# capture the stdout, stderr and return code via some crazy bash fu
eval "$({ cmd_err=$({ cmd_out=$( \
bam validate --in "$f" --verbose \
); cmd_rtn=$?; } 2>&1; declare -p cmd_out cmd_rtn >&2); declare -p cmd_err; } 2>&1)"
# check the return code for positive completion
if [ "${cmd_ret}" -eq "0" ]; then
printf -- "%s - bam validation completed for: %s\n" "$(date)" "${f}"
# check for string "(SUCCESS)" in bam command standard output
if grep -iq "(SUCCESS)" <<< "${cmd_out}"; then
printf -- "%s - Verification of the bam file has completed sucessfully.\n" "$(date)"
else
# verify the bam file exists and can be deleted
if [[ -f "$f" ]] && rm -f "$f" ; then
printf -- "%s - The bam file is corrupted and has been removed, please check log for reason.\n" "$(date)"
else
printf -- "%s - WARNING: The bam file is corrupted but the file could not be deleted.\n" "$(date)"
fi
fi
else
# The bam validate command above did not complete with a
# satisfactory result. This should not really ever happen unless
# the bam command does not exist or some serious error occurred
# when executing the bam command.
# Consider addition actions in addition to logging the outcome
printf -- "%s - WARNING: bam validation failed for file: %s - [%s]\n" "$(date)" "${f}" "${cmd_err}"
fi
done >> "$logfile"
Upvotes: 1