Reputation: 61
I tried to implement How to use expand in snakemake when some particular combinations of wildcards are not desired?
The goal is to only process crossed combinations between SUPERGROUPS
:
from itertools import product
DOMAINS=["Metallophos"]
SUPERGROUPS=["2supergroups","5supergroups"]
SUPERGROUPS_INVERSED=["5supergroups","2supergroups"]
CUTOFFS=["0"]
def filter_combinator(combinator, blacklist):
def filtered_combinator(*args, **kwargs):
for wc_comb in combinator(*args, **kwargs):
# Use frozenset instead of tuple
# in order to accomodate
# unpredictable wildcard order
if frozenset(wc_comb) not in blacklist:
yield wc_comb
return filtered_combinator
# "2supergroups/5supergroups" and "5supergroups/2supergroups" are undesired
forbidden = {
frozenset({("supergroup", "2supergroups"), ("supergroup_other", "2supergroups")}),
frozenset({("supergroup", "5supergroups"), ("supergroup_other", "5supergroups")})}
filtered_product = filter_combinator(product, forbidden)
rule target :
input:
expand(expand("results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics", filtered_product, supergroup=SUPERGROUPS, supergroup_other = SUPERGROUPS_INVERSED), cutoff=CUTOFFS, domain = DOMAINS)
rule tree_measures:
input:
tree="results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups.for.notung",
list="results/{domain}/{supergroup}/hmmer_search_bbh_1/bbhlist.txt.{domain}.fa.OGs.tbl.txt.0.list.txt.nh.OGs.txt",
mapping1="results/{domain}/{supergroup_other}/{supergroup}/OGSmapping.txt.list",
categories="results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.categories",
mapping2="results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.list",
supergroups="results/{domain}/{supergroup}/hmmer_search_2/{domain}.fa.OGs.tbl.txt.{cutoff}.supergroups.csv"
output:
"results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{cutoff}.statistics"
shell:
"~/tools/Python-2.7.11/python scripts/tree_measures.py {input.tree} {input.list} {input.mapping1} {input.categories} {input.mapping2} {input.supergroups} {wildcards.cutoff} results/{wildcards.domain}/{wildcards.supergroup}/{wildcards.supergroup_other}/"
But I still get an error message:
Missing input files for rule tree_measures:
results/Metallophos/5supergroups/5supergroups/OGSmapping.txt.list
results/Metallophos/5supergroups/5supergroups/OGSmapping.txt.categories
What am I missing?
Upvotes: 5
Views: 761
Reputation: 8194
I seems that you need to perform the expand in 2 steps, as follows:
rule target :
input:
expand(expand("results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics", filtered_product, supergroup=SUPERGROUPS, supergroup_other = SUPERGROUPS_INVERSED), cutoff=CUTOFFS, domain = DOMAINS)
The inner expand uses the filtered_product
trick, and the outer is a normal one.
Another approach is to use itertools.permutations
for the inner list:
from itertools import permutations
DOMAINS=["Metallophos"]
SUPERGROUPS=["2supergroups","5supergroups"]
CUTOFFS=["0"]
rule target :
input:
expand(
["results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics".format(supergroup=sgrp1, supergroup_other=sgrp2)
for (sgrp1, sgrp2) in permutations(SUPERGROUPS)],
cutoff=CUTOFFS, domain = DOMAINS)
Yet another possibility is to use zip
:
rule target :
input:
expand(
["results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics".format(supergroup=sgrp1, supergroup_other=sgrp2)
for (sgrp1, sgrp2) in zip(SUPERGROUPS, SUPERGROUPS_INVERSED)],
cutoff=CUTOFFS, domain = DOMAINS)
Upvotes: 3