pentavalentcarbon
pentavalentcarbon

Reputation: 180

Enforcing wildcard constraints in expansion

I want to collect all files matching the regex ^fs_node\d+\.xyz$, but I don't know how to write the expansion so that the glob uses the constraint. Right now,

wildcard_constraints:
    nodeidx = "\d+",

rule all:
    input:
        expand("fs_node{i}.xyz",
               i=glob_wildcards("fs_node{nodeidx}.xyz").nodeidx)

produces output that also matches files with irc, which I don't want:

    input: fs_node37_irc.xyz, fs_node41_irc.xyz, fs_node32.xyz, fs_node10.xyz, fs_node43.xyz, fs_node2.xyz, fs_node30_irc.xyz, fs_node16.xyz, fs_node45.xyz, fs_node23_irc.xyz, fs_node2_irc.xyz, fs_node44_irc.xyz, fs_node33_irc.xyz, fs_node35.xyz, fs_node1.xyz, fs_node28_irc.xyz, fs_node42.xyz, fs_node15_irc.xyz, fs_node12_irc.xyz, fs_node35_irc.xyz, fs_node42_irc.xyz, fs_node44.xyz, fs_node31.xyz, fs_node17_irc.xyz, fs_node8_irc.xyz, fs_node43_irc.xyz, fs_node15.xyz, fs_node5_irc.xyz, ...

How does one properly enforce (global) wildcard constraints in expansions? It's global because also gets used in other locations.

Upvotes: 1

Views: 362

Answers (1)

dariober
dariober

Reputation: 9062

Maybe glob_wildcards is not flexible enough. I would explicitly list all files, select those you want to keep with some regex, extract the variable part nodeidx and use that as wildcard. Not tested:

import os
import re

listdir = os.listdir(os.getcwd())

nodeidx = []
for x in listdir:
    if re.match('^fs_node\d+\.xyz$', x):
        idx = re.sub('^fs_node', '', re.sub('\.xyz$', '', x))
        _ = int(idx) # sanity check
        nodeidx.append(idx)

wildcard_constraints:
    nodeidx = '|'.join([re.escape(x) for x in nodeidx])

rule all:
    input:
        expand("fs_node{nodeidx}.xyz", nodeidx= nodeidx) 

Upvotes: 2

Related Questions