zillur rahman
zillur rahman

Reputation: 395

Missing output file(s) expected by nextflow process

I have a nextflow process that take input multiple files do something and then output some files. In the process I removed empty files in a condition.

    process imputation {
    input:
    set val(chrom),val(chunk_array),val(chunk_start),val(chunk_end),path(in_haps),path(refs),path(maps) from imp_ch
    output:
    tuple val("${chrom}"),path("${chrom}.*") into imputed
    script:
    def (haps,sample)=in_haps
    def (haplotype, legend, samples)=refs
    """
    impute4 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.imputed.chunk${chunk_array}" -no_maf_align -o_gz -int "${chunk_start}" "${chunk_end}" -Ne 20000 -buffer 1000 -seed 54321
    if [[ \$(gunzip -c "${chrom}.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c) == "0"]]
    then
     rm "${chrom}.imputed.chunk${chunk_array}.gen.gz"
    else
     qctools -g "${chrom}.imputed.chunk${chunk_array}.gen.gz" -snp-stats -osnp "${chrom}.imputed.chunk${chunk_array}.snp.stats"
    fi
    """
    }

The process works fine. The impute4 program give outputs of *gen.gz files, some of them might be empty. So, the if statement was added to remove those empty file because qctools can not read empty files and the process crashes. The problem is that, now I am getting error :

Missing output file(s) `chr16*` expected by process `imputation (165)` (note: input files are not included in the default matching set)

How could I resolve this issue. Any help?

Upvotes: 0

Views: 2424

Answers (2)

Patrick H.
Patrick H.

Reputation: 293

Using the optional pattern as suggested by user jfy133 would be one way to solve your issue. In any case, you might want to split the two commands in separate processes.

You could also storing the number of lines or the test statement you used in your if clause and use nextflow filter or branch operators on the output channel of your first process before running qctools

Filter:

Channel
    .from( 1, 2, 3, 4, 5 )
    .filter { it % 2 == 1 }

Branch:

Channel
    .from(1,2,3,40,50)
    .branch {
        small: it < 10
        large: it > 10
    }
    .set { result }

 result.small.view { "$it is small" }
 result.large.view { "$it is large" }

your solution might then look like this

process imputation {
    input:
        ...
    output:
        env(isempty), file(other), file(output) into imputed

    script:
        def (haps,sample)=in_haps
        def (haplotype, legend, samples)=refs
        """
        impute4 <your parameters>
        isempty=\$(gunzip -c "${chrom}.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c)
        """
}

filtered_imputed = imputed.filter { empty: it[0] > 0 }

process qctools {
    input:
        val(isempty), <your input> from filtered_imputed
    output:
        <your desired output> into qctools_output

    script:
    """
    qctools <your parameters>
    """
"""

Upvotes: 1

jfy133
jfy133

Reputation: 101

Would this nextflow pattern help?

Short version:

process foo {
  output:
  file 'foo.txt' optional true into foo_ch

  script:
  '''
  your_command
  '''
}

Basically by specifying the output is optional the process doesn't fail if it doesn't find anything the defined output glob.

However depending on how many files are output, you may wish to be more specific in your output declaration what sort of output files are required and which are optional, to ensure that your process still fails if all commands fail (For whatever reason)

Upvotes: 0

Related Questions