nextflow: create index - get path

Question

my apologies for another nextflow post. I'd like to create index of a reference genome. I have two scripts: main.nf and create_index.nf

main.nf

params.hg38genome ="/Users/name/Downloads/NM.fasta"
params.outDir = "./output"

include {create_index} from './index.nf'

workflow {
create_index(params.hg38genome)    
}

I've following code in index.nf

process create_index {

    tag { sample_id }

    publishDir "${params.outDir}/hg38index/", mode:"copy"
debug true

    input:
path( params.hg38genome)

    output:
    path("${params.outDir}_refhg38.fai"), emit: hg38_fasta_index

    script:


    """
    echo "hello $params.hg38genome $params.outDir  
"
bwa index $params.hg38genome

    """
}

I am unable to get any value in sample_id Second, I get error as:

Caused by: Missing output file(s) ./output_refhg38.fai expected by process create_index (null)

If I run bwa as: bwa index input.fasta

I get files as where input.fasta is located.

input.fasta.ann  
input.fasta.amb     
input.fasta.sa  
input.fasta.bwt     
input.fasta.pac

How do I enable nextflow to create folder output and within it NM.fasta.X where X is ann, etc. Also, it doesn't extract NM.fasta I tried with ${params.hg38genome}.baseName but failed

Steve · Accepted Answer

You get that error because you've declared a file that could not be found in your process working directory. Note that the FASTA index file (i.e. the .fai file) is not actually an output of bwa index. You might be thinking of samtools index which does indeed create the FASTA index .fai file. If your next step is to align some reads, you don't even need the FASTA index file - you only need the BWA index files. For example:

Contents of main.nf:

params.reads = '/Users/name/Downloads/tiny/normal/*_R{1,2}_xxx.fastq.gz'
params.hg38genome = '/Users/name/Downloads/NM.fasta'

include { bwa_index } from './bwa.nf'
include { bwa_mem } from './bwa.nf'


workflow {

    reads = Channel.fromFilePairs( params.reads )

    hg38genome = file( params.hg38genome )

    bwa_index( hg38genome, hg38genome.name )
    bwa_mem( reads, bwa_index.out )

    bwa_mem.out.view()
}

Contents of bwa.nf:

process bwa_index {

    input:
    path ref_fasta
    val prefix

    output:
    tuple val(prefix), path("${prefix}.{ann,amb,sa,bwt,pac}")

    """
    bwa index \
        -p "${prefix}" \
        "${ref_fasta}"
    """
}

process bwa_mem {

    tag { sample_id }

    input:
    tuple val(sample_id), path(reads)
    tuple val(idxbase), path("bwa_index/*")

    output:
    tuple val(sample_id), path("${sample_id}.aln.bam")

    script:
    def task_cpus = task.cpus > 1 ? task.cpus - 1 : task.cpus

    """
    bwa mem \
        -t ${task_cpus} \
        "bwa_index/${idxbase}" \
        ${reads} |
    samtools view \
       -1 \
       -o "${sample_id}.aln.bam" \
       -
    """
}

Contents of nextflow.config:

params {

    outdir = './results'
}

process {

    withName: bwa_index {

        publishDir = [
            path: "${params.outdir}/bwa_index",
            mode: 'copy',
        ]
        cpus = 1
        conda = 'bwakit=0.7.17-dev1'
    }

    withName: bwa_mem {

        publishDir = [
            path: "${params.outdir}/bwa_mem",
            mode: 'copy',
        ]
        cpus = 8
        conda = 'bwakit=0.7.17-dev1'
    }
}

conda {

    enabled = true
}

Results:

$ nextflow run main.nf -ansi-log false
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [determined_jang] DSL2 - revision: 1f32b172b7
Creating env using conda: bwakit=0.7.17-dev1 [cache /path/to/work/conda/env-c67b42794b99b0cecbbb27e78e7f5fb7]
[f6/10ac19] Submitted process > bwa_index
[7a/76be98] Submitted process > bwa_mem (foo)
[foo, /path/to/work/7a/76be98292ced9ca7e418470227b34a/foo.aln.bam]
[de/9d97f1] Submitted process > bwa_mem (baz)
[baz, /path/to/work/de/9d97f1270464644149ec0b64903a49/baz.aln.bam]
[4a/83079c] Submitted process > bwa_mem (bar)
[bar, /path/to/work/4a/83079c009dd7f28306c1f5108426ff/bar.aln.bam]

nextflow: create index - get path

Answers (1)

Related Questions