Reputation: 33
I have a dictionary with keys as patient IDs and a list of fastq files as values.
patient_samples = {
"patientA": ["sample1", "sample2", "sample3"],
"patientB": ["sample1", "sample4", "sample5", "sample6"]
}
I want to align each sample.fastq and output the aligned .bam file in a directory for each patient. The resulting directory structure I want is this:
├── patientA │ ├── sample1.bam │ ├── sample2.bam │ ├── sample3.bam ├── patientB │ ├── sample1.bam │ ├── sample4.bam │ ├── sample5.bam │ ├── sample6.bam
Here I used lambda wildcards to get the samples for each patient using the "patient_samples" dictionary.
rule align:
input:
lambda wildcards: \
["{0}.fastq".format(sample_id) \
for sample_id in patient_samples[wildcards.patient_id]
]
output:
{patient_id}/{sample_id}.bam"
shell:
### Alignment command
How can I write the rule all to reflect that only certain samples are aligned for each patient? I have tried referencing the dictionary key to specify the samples:
rule all:
input:
expand("{patient_id}/{sample_id}.bam", patient_id=patient_samples.keys(), sample_id=patient_samples[patient_id])
However, this leads to a NameError:
name 'patient_id' is not defined
Is there another way to do this?
Upvotes: 3
Views: 324
Reputation: 16551
The error is because the expand command does not know what is the patient_id
to use when listing the sample_id
values:
expand(
"{patient_id}/{sample_id}.bam",
patient_id=patient_samples.keys(),
sample_id=patient_samples[patient_id])
^^^^^ Unknown
Using expand
is convenient when you already have lists with wildcard values, in more complex cases it's best to use python:
list_inputs_all = [
f"{patient_id}/{sample_id}.bam"
for patient_id, sample_id
in patient_samples.items()
]
rule all:
input:
list_inputs_all
Upvotes: 2