justaguy
justaguy

Reputation: 3022

bash to remove file in directory if search string found in another file

The bash below is almost complete. The only part I am struggling with is that in the process.log if the string The bam file is corrupted and has been removed, please check log for reason. is found then the corresponding .bam ($f) in the bash, is removed. I added:

echo "The bam file is corrupted and has been removed, please check log for reason."
             [[ -f "$f" ]] && rm -f "$f"

in an attempt to do this, but it looks like it is removing the last .bam regardless (in the process.log NA19240.bam (that file has the search string in it), but it was not. Instead the last .bam (NS12911) in the process.log is (even though the search string is not there). I am not able to fix this and need some expert help. I apologize for the lengthy post, just trying to add all the details. Thank you :).

bash

logfile=/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log
for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
 echo "Start bam validation creation: $(date) - File: $f"
 bname=`basename $f`
 pref=${bname%%.bam}
 bam validate --in $f --verbose 2> /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt
 echo "End bam validation creation: $(date) - File: $f"
done >> "$logfile"
for file in /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/*.txt ; do
 echo "Start verifying $(date) - File: $file"
 bname=`basename $file`
 if $(grep -iq "(SUCCESS)" "${file}"); then
    echo "The verification of the bam file has completed sucessfully."
else
    echo "The bam file is corrupted and has been removed, please check log for reason."
             [[ -f "$f" ]] && rm -f "$f"
    echo "End of bam file verification: $(date) - File: ${file}"
fi
done >> "$logfile"

process.log

 Start bam validation creation: Fri May  6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
 End bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
 Start bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
 End bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
 Start bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
 End bam validation creation: Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
 Start verifying Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
 The verification of the bam file has completed successfully.
 End of bam file verification: Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
 Start verifying Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
 The bam file is corrupted and has been removed, please check log for reason.
 End of bam file verification: Fri May  6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
 Start verifying Fri May  6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt
 The verification of the bam file has completed successfully.
 End of bam file verification: Fri May  6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt

Upvotes: 0

Views: 54

Answers (1)

John Mark Mitchell
John Mark Mitchell

Reputation: 4822

It is is a little hard for me to completely replicate your environment so I am having to make some assumptions about your setup and exactly what your constraints are. I see many ways the process could be simplified or made more efficient but instead of introducing to many unneeded changes, I mainly focused on making the script work.

With that said, I did rearranged the processing to where each ${pref}_validation.txt is verified right after it is created.

Can you try the following (note: Updated the script. First time around I got going a little too fast and copied the wrong version.) and let me know what the result is:

#!/bin/bash

logfile="/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log"

for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
    echo "Start bam validation creation: $(date) - File: $f"
    bname="$(basename "$f")"
    pref="${bname%%.bam}"
    bam validate --in "$f" --verbose 2> "/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"
    echo "End bam validation creation: $(date) - File: $f"

    file="/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"

    echo "Start verifying $(date) - File: $file"

    if grep -iq "(SUCCESS)" "${file}"; then
        echo "The verification of the bam file has completed sucessfully."
    else
        if [[ -f "$f" ]]; then
            rm -f "$f"
            echo "The bam file is corrupted and has been removed, please check log for reason."
        fi
    fi

    echo "End of bam file verification: $(date) - File: ${file}"

done >> "$logfile"

Hopefully combining the two steps in the one for loop does not deviate from some process requirement you have. What I find helpful about it is that it allows for a more streamlined code flow and the log file should now read like:

Start bam validation creation: Fri May  6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
End bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
Start verifying Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
Start bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
...

Highly Modified Version
I took a try at the highly streamlined and more resilient version of the script. I would be interested if you could check this one as well:

#!/bin/bash

# basepath allows you to quickly move the script by updating this path
basepath="/home/cmccabe/Desktop/NGS/API/5-4-2016"

# give the logfile a name
logfile="${basepath}/process.log"

# for each .bam file in basepath do
for f in ${basepath}/*.bam ; do

    # validate the file with the bam command
    # capture the stdout, stderr and return code via some crazy bash fu
    eval "$({ cmd_err=$({ cmd_out=$( \
        bam validate --in "$f" --verbose \
      ); cmd_rtn=$?; } 2>&1; declare -p cmd_out cmd_rtn >&2); declare -p cmd_err; } 2>&1)"

    # check the return code for positive completion
    if [ "${cmd_ret}" -eq "0" ]; then
        printf -- "%s - bam validation completed for: %s\n" "$(date)" "${f}"

        # check for string "(SUCCESS)" in bam command standard output 
        if grep -iq "(SUCCESS)" <<< "${cmd_out}"; then
            printf -- "%s - Verification of the bam file has completed sucessfully.\n" "$(date)"
        else
            # verify the bam file exists and can be deleted
            if [[ -f "$f" ]] && rm -f "$f" ; then
                printf -- "%s - The bam file is corrupted and has been removed, please check log for reason.\n" "$(date)"
            else
                printf -- "%s - WARNING: The bam file is corrupted but the file could not be deleted.\n" "$(date)"
            fi
        fi
    else
        # The bam validate command above did not complete with a
        # satisfactory result. This should not really ever happen unless
        # the bam command does not exist or some serious error occurred
        # when executing the bam command.
        # Consider addition actions in addition to logging the outcome
        printf -- "%s - WARNING: bam validation failed for file: %s - [%s]\n" "$(date)" "${f}" "${cmd_err}"
    fi

done >> "$logfile"

Upvotes: 1

Related Questions