raygozag
raygozag

Reputation: 218

path not being detected by Nextflow

i'm new to nf-core/nextflow and needless to say the documentation does not reflect what might be actually implemented. But i'm defining the basic pipeline below:

    nextflow.enable.dsl=2


    process RUNBLAST{
    input:
    val thr
    path query
    path db
    path output

    output:
    path output

    script:
    """
        blastn -query ${query} -db ${db} -out ${output} -num_threads ${thr}
    """
 
  }

   workflow{

    //println "I want to BLAST $params.query to $params.dbDir/$params.dbName using $params.threads CPUs and output it to $params.outdir"



   RUNBLAST(params.threads,params.query,params.dbDir, params.output)

 }

Then i'm executing the pipeline with

nextflow run main.nf --query test2.fa --dbDir blast/blastDB

Then i get the following error:

N E X T F L O W  ~  version 22.10.6
Launching `main.nf` [dreamy_hugle] DSL2 - revision: c388cf8f31
Error executing process > 'RUNBLAST'
Error executing process > 'RUNBLAST'

Caused by:
  Not a valid path value: 'test2.fa'


Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

I know test2.fa exists in the current directory:

(nfcore) MN:nf-core-basicblast jraygozagaray$ ls
CHANGELOG.md        conf            other.nf
CITATIONS.md        docs            pyproject.toml
CODE_OF_CONDUCT.md  lib         subworkflows
LICENSE         main.nf         test.fa
README.md       modules         test2.fa
assets          modules.json        work
bin         nextflow.config     workflows
blast           nextflow_schema.json

I also tried with "file" instead of path but that is deprecated and raises other kind of errors.

It'll be helpful to know how to fix this to get myself started with the pipeline building process.

Shouldn't nextflow copy the file to the execution path?

Thanks

Upvotes: 1

Views: 3243

Answers (1)

Steve
Steve

Reputation: 54502

You get the above error because params.query is not actually a path value. It's probably just a simple String or GString. The solution is to instead supply a file object, for example:

workflow {

    query = file(params.query)

    BLAST( query, ... )
}

Note that a value channel is implicitly created by a process when it is invoked with a simple value, like the above file object. If you need to be able to BLAST multiple query files, you'll instead need a queue channel, which can be created using the fromPath factory method, for example:

params.query = "${baseDir}/data/*.fa"
params.db = "${baseDir}/blastdb/nt"
params.outdir = './results'

db_name = file(params.db).name
db_path = file(params.db).parent


process BLAST {

    publishDir(
        path: "{params.outdir}/blast",
        mode: 'copy',
    )

    input:
    tuple val(query_id), path(query)
    path db

    output:
    tuple val(query_id), path("${query_id}.out")

    """
    blastn \\
        -num_threads ${task.cpus} \\
        -query "${query}" \\
        -db "${db}/${db_name}" \\
        -out "${query_id}.out"
    """
}
workflow{

    Channel
        .fromPath( params.query )
        .map { file -> tuple(file.baseName, file) }
        .set { query_ch }

    BLAST( query_ch, db_path )
}

Note that the usual way to specify the number of threads/cpus is using cpus directive, which can be configured using a process selector in your nextflow.config. For example:

process {

    withName: BLAST {
        cpus = 4
    }
}

Upvotes: 2

Related Questions