Reputation: 35
I have the following snakefile:
import os
SAMPLES = [i.replace('sample_', '') for i in os.listdir('final_raw')]
rule all:
input:
expand('analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq', sample = SAMPLES)
rule trimming:
input:
read1 = 'final_raw/sample_{sample}/read1_sample_{sample}.fastq.gz',
read2 = 'final_raw/sample_{sample}/read2_sample_{sample}.fastq.gz',
barcodes = 'final_bc.fasta'
output:
'analysis/trimmed_data/{sample}_read_1.fastq.gz',
'analysis/trimmed_data/{sample}_read_2.fastq.gz'
threads: 6
shell:
'flexbar -threads {threads} '
'-r {input.read1} '
'-p {input.read2} '
'-aa Nextera -br {input.barcodes} '
'-t analysis/trimmed_data/{wildcards.sample}_read '
'-z GZ'
rule create_hybrid_genome:
input:
human = 'genomes/human/GRCh38.fna',
covid = 'genomes/cov/cov_genome.fna'
output:
'genomes/hybrid/hybrid.fna'
shell:
'cat {input.human} {input.covid} > {output}'
rule hybrid_indexing:
input:
rules.create_hybrid_genome.output
output:
'genomes/hybrid/hybrid.fna.amb'
shell:
'bwa index {input}'
rule map_hybrid:
input:
read1 = 'analysis/trimmed_data/{sample}_read_1.fastq.gz',
read2 = 'analysis/trimmed_data/{sample}_read_2.fastq.gz',
genome = 'genomes/hybrid/hybrid.fna'
output:
'analysis/results/mapped/hybrid/{sample}.sorted.bam'
shell:
'bwa mem -t 8 {input.genome} {input.read1} {input.read2} | '
'samtools sort -@ 2 -o {wildcards.sample}.sorted.bam && '
'samtools index -@ 8 {wildcards.sample}.sorted.bam && '
'mv {wildcards.sample}.sorted.* analysis/results/mapped/hybrid'
rule unmapped_hybrid:
input:
rules.map_hybrid.output
output:
'analysis/results/unmapped/hybrid/um_{sample}.bam'
shell:
'samtools view -b -f 4 {input} > um_{wildcards.sample}.bam && '
'mv um_{wildcards.sample}.bam analysis/results/unmapped/hybrid'
rule fastq_hybrid:
input:
rules.unmapped_hybrid.output
output:
read1 = 'analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq',
read2 = 'analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq'
shell:
'samtools fastq '
'-1 {output.read1} '
'-2 {output.read2} '
'-0 /dev/null '
'-s /dev/null '
'-n {input}'
I am trying to add a rule 'kraken' that runs the following for loop:
for FILE in $(ls analysis/results/unmapped/hybrid/read1_*_hy.fastq | sed 's/read1_*_hy.fastq//'); do \
kraken2 --db {input.database} \
--memory-mapping \
--threads 8 \
--use-names \
--report analysis/results/kraken2/hybrid/{{}}${{FILE}}.kraken \
--paired read1_${{FILE}}_hy.fastq read2_${{FILE}}_hy.fastq \
--unclassified-out /analysis/results/unclassified/hybrid/{{}}${{FILE}}_uc#.fastq; done
the for loop is responsible for running all .fastq files at once for the kraken database to be stored in memory and run the classification for all samples. how can this be achieved without referring to the regular snakemake shell that takes wildcards and runs the command for each sample?
the normal way is to have the following:
rule kraken2:
input:
database = 'kraken2_db'
read1 = 'analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq'
read2 = 'analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq'
output:
'analysis/results/kraken2/hybrid/{sample}.kraken'
shell:
'kraken2 '
'--db {input.database} '
'--threads 8 '
'--paired '
'--use-names '
'--unclassified-out {wildcards.sample}_uc#.fq '
'--report {output} '
'{input.read1} {input.read2}'
and rule all updated to the following:
rule all:
input:
expand('analysis/results/kraken2/hybrid/{sample}.kraken', sample = SAMPLES)
this will run each sample at a time and I am struggeling with running the for loop in the shell, and most of the time getting the error:
MissingInputException in rule all in line 5 of /home/stud9/NAS/snakefile: Missing input files for rule all: affected files:
any suggestions?
Upvotes: 0
Views: 113
Reputation: 2079
Not sure this is the problem, nor why you need to run them at once, but the sed command in your for loop is matching read1_*_hy.fastq
, e.g. read1
followed by 0 or more _
followed by _hy.fastq
. You probably want it to be
$(ls analysis/results/unmapped/hybrid/read1_*_hy.fastq | sed 's/read1_(.*)_hy.fastq/\1/')
To get the sample names from the filenames. Check by echoing the FILE and command you hope to run to see if your bash looks good.
Upvotes: 0
Reputation: 150
I am not sure your loop script is working correctly. However, I will try to answer your question to run all fastq samples together.
ls analysis/results/unmapped/hybrid/
Here are my example files
read1_sample1_hy.fastq read2_sample1_hy.fastq
read1_sample2_hy.fastq read2_sample2_hy.fastq
To run all the samples at once you can use expand()
in the rule.
rule all:
input:
'analysis/results/kraken2/hybrid/merged_samples.kraken'
rule kraken2:
input:
read1 = expand("analysis/results/unmapped/hybrid/read1_{sample}_hy.fastq",sample = SAMPLES),
read2 = expand("analysis/results/unmapped/hybrid/read2_{sample}_hy.fastq", sample = SAMPLES)
output:
"analysis/results/kraken2/hybrid/merged_samples.kraken"
shell:
"""
echo kraken2 {input.read1} {input.read2} > {output}
"""
Snakemake will run all the read1 files and all read2 files together.
Upvotes: 0