Reputation: 395
I have a nextflow process that take input multiple files do something and then output some files. In the process I removed empty files in a condition.
process imputation {
input:
set val(chrom),val(chunk_array),val(chunk_start),val(chunk_end),path(in_haps),path(refs),path(maps) from imp_ch
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
script:
def (haps,sample)=in_haps
def (haplotype, legend, samples)=refs
"""
impute4 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.imputed.chunk${chunk_array}" -no_maf_align -o_gz -int "${chunk_start}" "${chunk_end}" -Ne 20000 -buffer 1000 -seed 54321
if [[ \$(gunzip -c "${chrom}.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c) == "0"]]
then
rm "${chrom}.imputed.chunk${chunk_array}.gen.gz"
else
qctools -g "${chrom}.imputed.chunk${chunk_array}.gen.gz" -snp-stats -osnp "${chrom}.imputed.chunk${chunk_array}.snp.stats"
fi
"""
}
The process works fine. The impute4
program give outputs of *gen.gz
files, some of them might be empty. So, the if statement was added to remove those empty file because qctools
can not read empty files and the process crashes. The problem is that, now I am getting error :
Missing output file(s) `chr16*` expected by process `imputation (165)` (note: input files are not included in the default matching set)
How could I resolve this issue. Any help?
Upvotes: 0
Views: 2424
Reputation: 293
Using the optional pattern as suggested by user jfy133 would be one way to solve your issue. In any case, you might want to split the two commands in separate processes.
You could also storing the number of lines or the test statement you used in your if clause and use nextflow filter
or branch
operators on the output channel of your first process before running qctools
Channel
.from( 1, 2, 3, 4, 5 )
.filter { it % 2 == 1 }
Channel
.from(1,2,3,40,50)
.branch {
small: it < 10
large: it > 10
}
.set { result }
result.small.view { "$it is small" }
result.large.view { "$it is large" }
your solution might then look like this
process imputation {
input:
...
output:
env(isempty), file(other), file(output) into imputed
script:
def (haps,sample)=in_haps
def (haplotype, legend, samples)=refs
"""
impute4 <your parameters>
isempty=\$(gunzip -c "${chrom}.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c)
"""
}
filtered_imputed = imputed.filter { empty: it[0] > 0 }
process qctools {
input:
val(isempty), <your input> from filtered_imputed
output:
<your desired output> into qctools_output
script:
"""
qctools <your parameters>
"""
"""
Upvotes: 1
Reputation: 101
Would this nextflow pattern help?
Short version:
process foo {
output:
file 'foo.txt' optional true into foo_ch
script:
'''
your_command
'''
}
Basically by specifying the output is optional the process doesn't fail if it doesn't find anything the defined output glob.
However depending on how many files are output, you may wish to be more specific in your output declaration what sort of output files are required and which are optional, to ensure that your process still fails if all commands fail (For whatever reason)
Upvotes: 0