Reputation: 892
my apologies for another nextflow post. I'd like to create index of a reference genome. I have two scripts: main.nf and create_index.nf
main.nf
params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"
include {create_index} from './index.nf'
workflow {
create_index(params.hg38genome)
}
I've following code in index.nf
process create_index {
tag { sample_id }
publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true
input:
path( params.hg38genome)
output:
path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index
script:
"""
echo "hello $params.hg38genome $params.outDir \n"
bwa index $params.hg38genome
"""
}
I am unable to get any value in sample_id Second, I get error as:
Caused by: Missing output file(s)
./output_refhg38.fai
expected by processcreate_index (null)
If I run bwa as:
bwa index input.fasta
I get files as where input.fasta
is located.
input.fasta.ann
input.fasta.amb
input.fasta.sa
input.fasta.bwt
input.fasta.pac
How do I enable nextflow to create folder output and within it NM.fasta.X where X is ann, etc. Also, it doesn't extract NM.fasta I tried with ${params.hg38genome}.baseName
but failed
Upvotes: 1
Views: 554
Reputation: 54502
You get that error because you've declared a file that could not be found in your process working directory. Note that the FASTA index file (i.e. the .fai
file) is not actually an output of bwa index
. You might be thinking of samtools index
which does indeed create the FASTA index .fai
file. If your next step is to align some reads, you don't even need the FASTA index file - you only need the BWA index files. For example:
Contents of main.nf
:
params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'
include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'
workflow {
reads = Channel.fromFilePairs( params.reads )
hg38genome = file( params.hg38genome )
bwa_index( hg38genome, hg38genome.name )
bwa_mem( reads, bwa_index.out )
bwa_mem.out.view()
}
Contents of bwa.nf
:
process bwa_index {
input:
path ref_fasta
val prefix
output:
tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")
"""
bwa index \\
-p "${prefix}" \\
"${ref_fasta}"
"""
}
process bwa_mem {
tag { sample_id }
input:
tuple val(sample_id), path(reads)
tuple val(idxbase), path("bwa_index/*")
output:
tuple val(sample_id), path("${sample_id}.aln.bam")
script:
def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus
"""
bwa mem \\
-t ${task_cpus} \\
"bwa_index/${idxbase}" \\
${reads} |
samtools view \\
-1 \\
-o "${sample_id}.aln.bam" \\
-
"""
}
Contents of nextflow.config
:
params {
outdir = './results'
}
process {
withName: bwa_index {
publishDir = [
path: "${params.outdir}/bwa_index",
mode: 'copy',
]
cpus = 1
conda = 'bwakit=0.7.17-dev1'
}
withName: bwa_mem {
publishDir = [
path: "${params.outdir}/bwa_mem",
mode: 'copy',
]
cpus = 8
conda = 'bwakit=0.7.17-dev1'
}
}
conda {
enabled = true
}
Results:
$ nextflow run main.nf -ansi-log false
N E X T F L O W ~ version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]
Upvotes: 3