Alexlok
Alexlok

Reputation: 3134

Expand paths from a channel

My data is structured as samples that are run in batches. So I have a directory hierarchy like this:

/path/to/dir/batch_1/sample_1
/path/to/dir/batch_1/sample_2
/path/to/dir/batch_1/...
/path/to/dir/batch_2/sample_1
/path/to/dir/batch_2/sample_2
/path/to/dir/batch_2/...
/path/to/dir/...

I want to apply a process to every sample for a given subset of batches. One approach that works is to generate a channel listing the samples:

path_to_samples= Channel
    .fromPath(['/path/to/dir/batch_2/sample_*',
               '/path/to/dir/batch_322/sample_*'], type: 'dir' )

process my_process{

    input:
    path(sample) from path_to_samples

    """
    do stuff
    """
}

Now, I'd like to provide the batch names separately, and have the script find the corresponding samples. Something like that:

params.root_dir = '/path/to/dir/'
params.batch_names = Channel.from('batch_2', 'batch_322')

// make samples channel: incorrect
path_to_samples = params.batch_names
                        .map { params.root_dir + it + 'sample_*' }
                        .toPath()

process my_process{

    input:
    path(sample) from path_to_samples

    """
    do stuff
    """
}

So, I am thinking incorrectly about channels? Is there a way to "flatten" the sample list through channel operations? Or is the correct approach to make a more complex Groovy closure that will list the files in each batch directory and return it as a tuple or list?

Upvotes: 1

Views: 595

Answers (1)

Steve
Steve

Reputation: 54502

Not sure how you'd like to provide your input batch names, but you could create your list of glob patterns using a simple closure then use them to create your input channel:

params.root_dir = '/path/to/dir'
params.batch_names = /path/to/batch_names.txt'

batch_names = file(params.batch_names)
sample_dirs = batch_names.readLines().collect { "${params.root_dir}/${it}/sample_*" }

samples = Channel.fromPath( sample_dirs, type: 'dir' )

process my_process{

    input:
    path(sample) from samples

    """
    ls -l "${sample}"
    """
}

I would be inclined to just leave the input glob pattern as a param, though. This approach offers the most flexibility, but may not suit your use case:

params.samples = '/path/to/dir/batch_{2,322}/sample_*'

samples = Channel.fromPath( params.samples, type: 'dir' )

Upvotes: 1

Related Questions