Reputation: 3134
My data is structured as samples that are run in batches. So I have a directory hierarchy like this:
/path/to/dir/batch_1/sample_1
/path/to/dir/batch_1/sample_2
/path/to/dir/batch_1/...
/path/to/dir/batch_2/sample_1
/path/to/dir/batch_2/sample_2
/path/to/dir/batch_2/...
/path/to/dir/...
I want to apply a process to every sample for a given subset of batches. One approach that works is to generate a channel listing the samples:
path_to_samples= Channel
.fromPath(['/path/to/dir/batch_2/sample_*',
'/path/to/dir/batch_322/sample_*'], type: 'dir' )
process my_process{
input:
path(sample) from path_to_samples
"""
do stuff
"""
}
Now, I'd like to provide the batch names separately, and have the script find the corresponding samples. Something like that:
params.root_dir = '/path/to/dir/'
params.batch_names = Channel.from('batch_2', 'batch_322')
// make samples channel: incorrect
path_to_samples = params.batch_names
.map { params.root_dir + it + 'sample_*' }
.toPath()
process my_process{
input:
path(sample) from path_to_samples
"""
do stuff
"""
}
So, I am thinking incorrectly about channels? Is there a way to "flatten" the sample list through channel operations? Or is the correct approach to make a more complex Groovy closure that will list the files in each batch directory and return it as a tuple or list?
Upvotes: 1
Views: 595
Reputation: 54502
Not sure how you'd like to provide your input batch names, but you could create your list of glob patterns using a simple closure then use them to create your input channel:
params.root_dir = '/path/to/dir'
params.batch_names = /path/to/batch_names.txt'
batch_names = file(params.batch_names)
sample_dirs = batch_names.readLines().collect { "${params.root_dir}/${it}/sample_*" }
samples = Channel.fromPath( sample_dirs, type: 'dir' )
process my_process{
input:
path(sample) from samples
"""
ls -l "${sample}"
"""
}
I would be inclined to just leave the input glob pattern as a param, though. This approach offers the most flexibility, but may not suit your use case:
params.samples = '/path/to/dir/batch_{2,322}/sample_*'
samples = Channel.fromPath( params.samples, type: 'dir' )
Upvotes: 1