Reputation: 386
I have a shell script that works well if I just run it from command line. When I call it from a rule within snakemake it fails.
The script runs a for loop over a file of identifiers and uses those to grep the sequences from a fastq file followed by multiple sequence alignment and makes a consensus.
Here is the script. I placed some echo statements in there and for some reason it doesn't call the commands. It stops at the grep statement.
I have tried adding set +o pipefail; in the rule but that doesn't work either.
#!/bin/bash
function Usage(){
echo -e "\
Usage: $(basename $0) -r|--read2 -l|--umi-list -f|--outfile \n\
where: ... \n\
" >&2
exit 1
}
# Check argument count
[[ "$#" -lt 2 ]] && Usage
# parse arguments
while [[ "$#" -gt 1 ]];do
case "$1" in
-r|--read2)
READ2="$2"
shift
;;
-l|--umi-list)
UMI="$2"
shift
;;
-f|--outfile)
OUTFILE="$2"
shift
;;
*)
Usage
;;
esac
shift
done
# Set defaults
# Check arguments
[[ -f "${READ2}" ]] || (echo "Cannot find input file ${READ2}, exiting..." >&2; exit 1)
[[ -f "${UMI}" ]] || (echo "Cannot find input file ${UMI}, exiting..." >&2; exit 1)
#Create output directory
OUTDIR=$(dirname "${OUTFILE}")
[[ -d "${OUTDIR}" ]] || (set -x; mkdir -p "${OUTDIR}")
# Make temporary directories
TEMP_DIR="${OUTDIR}/temp"
[[ -d "${TEMP_DIR}" ]] || (set -x; mkdir -p "${TEMP_DIR}")
#RUN consensus script
for f in $( more "${UMI}" | cut -f1);do
NAME=$(echo $f)
grep "${NAME}" "${READ2}" | cut -f1 -d ' ' | sed 's/@M/M/' > "${TEMP_DIR}/${NAME}.name"
echo subsetting reads
seqtk subseq "${READ2}" "${TEMP_DIR}/${NAME}.name" | seqtk seq -A > "${TEMP_DIR}/${NAME}.fasta"
~/software/muscle3.8.31_i86linux64 -msf -in "${TEMP_DIR}/${NAME}.fasta" -out "${TEMP_DIR}/${NAME}.muscle.fasta"
echo make consensus
~/software/EMBOSS-6.6.0/emboss/cons -sequence "${TEMP_DIR}/${NAME}.muscle.fasta" -outseq "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i 's/n//g' "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i "s/EMBOSS_001/${NAME}.cons/" "${TEMP_DIR}/${NAME}.cons.fasta"
done
cat "${TEMP_DIR}/*.cons.fasta" > "${OUTFILE}"
Snakemake rule:
rule make_consensus:
input:
r2=get_extracted,
lst="{prefix}/{sample}/reads/cell_barcode_umi.count"
output:
fasta="{prefix}/{sample}/reads/fasta/{sample}.R2.consensus.fa"
shell:
"sh ./scripts/make_consensus.sh -r {input.r2} -l {input.lst} -f {output.fasta}"
Edit Snakemake error messages I changed some of the paths to a neutral filepath
RuleException:
CalledProcessError in line 29 of ~/user/scripts/consensus.smk:
Command ' set -euo pipefail; sh ./scripts/make_consensus.sh -r ~/user/file.extracted.fastq -l ~/user/cell_barcode_umi
.count -f ~/user/file.consensus.fa ' returned non-zero exit status 1.
File "~/user/scripts/consensus.smk", line 29, in __rule
_make_consensus
File "~/user/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
If there are better ways to do this than using a shell for loop please let me know!
thanks!
Edit
Script ran as standalone: first grep
grep AGGCCGTTCT_TGTGGATG R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/@M/M/' > ./fasta/temp/AGGCCGTTCT_TGTGGATG.name
Script ran through snakemake: first 2 grep statements
grep :::::::::::::: R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/@M/M/' > ./fasta/temp/::::::::::::::.name
I'm now trying to figure out where those :::: in snakemake are coming from. All ideas welcome
Upvotes: 3
Views: 1059
Reputation: 9062
It stops at the grep statement
My guess is that the grep
command in make_consensus.sh
doesn't capture anything. grep
returns exit code 1 in such cases and the non-zero exit status propagates to snakemake. (see also Handling SIGPIPE error in snakemake)
Loosely related... There is an inconsistency between the shebang of make_consensus.sh
that says the script should be executed with bash
(#!/bin/bash
) and the actual execution using sh (sh ./scripts/make_consensus.sh
). (In practice it shouldn't make any difference since sh is probably redirected to bash anyway)
Upvotes: 1