Reputation: 39
I'm searching a very convoluted directory tree using os.walk in python 2.7.7 and want to limit the searching by using a trim in place for the resultant directories
import os,re
dirExclude = set(['amip4K','amip4xCO2','aqua4K','aqua4xCO2'])
for (path,dirs,files) in os.walk(inpath,topdown=True):
dirs[:] = [d for d in dirs if d not in dirExclude]
# Do something
I want to append to this dirExclude list/set anything that matches the regular expression r'decadal[0-9]{4}', however am having a hard time determining how best to use a regular expression in my list/set definition?
Any suggestions here? Or indeed a more efficient way to use the os.walk function?
After a number of suggestions the above can be improved to:
import os,re
dirExclude = set(['amip4K','amip4xCO2','aqua4K','aqua4xCO2'])
decExclude = re.compile(r'decadal[0-9]{4}')
for (path,dirs,files) in os.walk(inpath,topdown=True):
dirs[:] = [d for d in dirs if d not in dirExclude and not re.search(decExclude,d)]
# Do something
After investigating the dir[:] =
versus dir =
assignment, the [:]
is needed to ensure that os.walk uses the pruned directory listing, rather than the full (pre-pruned) directory listing
Upvotes: 2
Views: 679
Reputation: 24788
Augmenting the previous suggestions, you can use ifilterfalse
(or filterfalse
in Python 3.x) to efficiently filter on a regular expression:
from itertools import ifilterfalse
import re
import os
exclude = {'foo', 'bar', 'baz'}
expr = re.compile(r'decadal\d{4}')
for (path, dirs, files) in os.walk(inpath):
dirs[:] = set(ifilterfalse(expr.match, dirs)) - exclude
Some further notes:
dir = [alist]
is insufficient because this only modifies what the local label dir
is referring to (i.e. it is no longer referring to the the dirs
list that os.walk
uses). You must modify the actual list that dirs
list that os.walk
references. You can do this (as above) by doing the slice assignment operator. This more or less equivalent to the expression: dirs.__setitem__(slice(None, None), [alist])
Upvotes: 1
Reputation: 78690
Instead of adding to dirExclude
, why not just check whether there's a match for r'decadal[0-9]{4}'
in a dirname d
?
I'm thinking of something like this:
import re
dirExclude = set(['amip4K','amip4xCO2','aqua4K','aqua4xCO2'])
exre = re.compile(r'decadal[0-9]{4}')
for (path,dirs,files) in os.walk(inpath,topdown=True):
dirs = [d for d in dirs if d not in dirExclude and not exre.search(d)]
# Do something
Explanation:
exre.search(d)
will return None
if there is no match for your regex inside d
. not None
will then evaluate to True
. Otherwise, exre.search(d)
will return a MatchObject
and not exre.search(d)
will evaluate to False
.
Compiling the regular expression is optional. Without compiling, you would issue
exre = r'decadal[0-9]{4}'
and
dirs = [d for d in dirs if d not in dirExclude and not re.search(exre, d)]
Compiling can be useful when you need to apply a regex a lot of times in order to do the compiling part only once. However, most of the time you won't notice a difference, as even if you don't compile the regex manually Python will cache the last used regexes. To be precise, the last one hundred regexes, though the only reference I got for this is the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan.
Upvotes: 1
Reputation: 5844
If you simply want to avoid all directories that match the re, you could do:
d_re = re.compile(r'decadal[0-9]{4}')
dirs = [d for d in dirs if d_re.match(d) is None]
You could retrieve all the ignored files at the end by:
dirExclude = dirExclude.union(d for d in dirs if d not in dirExclude)
or
[dirExclude.add(d) for d in dirs if d not in dirExclude]
Upvotes: 0