Reputation: 1059
I have python 2.7 and am trying to issue:
glob('{faint,bright*}/{science,calib}/chip?/')
I obtain no matches, however from the shell echo {faint,bright*}/{science,calib}/chip?
gives:
faint/science/chip1 faint/science/chip2 faint/calib/chip1 faint/calib/chip2 bright1/science/chip1 bright1/science/chip2 bright1w/science/chip1 bright1w/science/chip2 bright2/science/chip1 bright2/science/chip2 bright2w/science/chip1 bright2w/science/chip2 bright1/calib/chip1 bright1/calib/chip2 bright1w/calib/chip1 bright1w/calib/chip2 bright2/calib/chip1 bright2/calib/chip2 bright2w/calib/chip1 bright2w/calib/chip2
What is wrong with my expression?
Upvotes: 23
Views: 11523
Reputation: 18625
As that other guy
pointed out, Python doesn't support brace expansion directly. But since brace expansion is done before the wildcards are evaluated, you could do that yourself, e.g.,
result = glob('{faint,bright*}/{science,calib}/chip?/')
becomes
result = [
f
for b in ['faint', 'bright*']
for s in ['science', 'calib']
for f in glob(f'{b}/{s}/chip?/')
]
Upvotes: 4
Reputation: 5535
The wcmatch
library has an interface similar to Python's standard glob
, with options to enable brace expansion, tilde expansion, and more. Enabling brace expansion, for example:
from wcmatch import glob
glob.glob('{faint,bright*}/{science,calib}/chip?/', flags=glob.BRACE)
Upvotes: 2
Reputation: 13933
Combining globbing with brace expansion.
pip install braceexpand
Sample:
from glob import glob
from braceexpand import braceexpand
def braced_glob(path):
l = []
for x in braceexpand(path):
l.extend(glob(x))
return l
>>> braced_glob('/usr/bin/{x,z}*k')
['/usr/bin/xclock', '/usr/bin/zipcloak']
Upvotes: 14
Reputation: 5974
As stated in other answers, brace-expansion is a pre-processing step for glob: you expand all the braces, then run glob on each of the results. (Brace-expansion turns one string into a list of strings.)
Orwellophile recommends the braceexpand
library. This feels to me like too small of a problem to justify a dependency (though it's a common problem that ought to be in the standard library, ideally packaged in the glob module).
So here's a way to do it with a few lines of code.
import itertools
import re
def expand_braces(text, seen=None):
if seen is None:
seen = set()
spans = [m.span() for m in re.finditer("\{[^\{\}]*\}", text)][::-1]
alts = [text[start + 1 : stop - 1].split(",") for start, stop in spans]
if len(spans) == 0:
if text not in seen:
yield text
seen.add(text)
else:
for combo in itertools.product(*alts):
replaced = list(text)
for (start, stop), replacement in zip(spans, combo):
replaced[start:stop] = replacement
yield from expand_braces("".join(replaced), seen)
### testing
text_to_expand = "{{pine,}apples,oranges} are {tasty,disgusting} to m{}e }{"
for result in expand_braces(text_to_expand):
print(result)
prints
pineapples are tasty to me }{
oranges are tasty to me }{
apples are tasty to me }{
pineapples are disgusting to me }{
oranges are disgusting to me }{
apples are disgusting to me }{
What's happening here is:
seen
to only yield results that haven't yet been seen.spans
is the starting and stopping indexes of all innermost, balanced brackets in the text
. The order is reversed by the [::-1]
slice, such that indexes go from highest to lowest (will be relevant later).alts
is the corresponding list of comma-delimited alternatives.text
does not contain balanced brackets), yield the text
itself, ensuring that it is unique with seen
.itertools.product
to iterate over the Cartesian product of comma-delimited alternatives.list
, rather than str
), and we have to replace the highest indexes first. If we replaced the lowest indexes first, the later indexes would have changed from what they were in the spans
. This is why we reversed spans
when it was first created.text
might have curly brackets within curly brackets. The regular expression only found balanced curly brackets that do not contain any other curly brackets, but nested curly brackets are legal. Therefore, we need to recurse until there are no nested curly brackets (the len(spans) == 0
case). Recursion with Python generators uses yield from
to re-yield each result from the recursive call.In the output, {{pine,}apples,oranges}
is first expanded to {pineapples,oranges}
and {apples,oranges}
, and then each of these is expanded. The oranges
result would appear twice if we didn't request unique results with seen
.
Empty brackets like the ones in m{}e
expand to nothing, so this is just me
.
Unbalanced brackets, like }{
, are left as-is.
This is not an algorithm to use if high performance for large datasets is required, but it's a general solution for reasonably sized data.
Upvotes: 2
Reputation: 425
Since {}
aren't supported by glob()
in Python, what you probably want is something like
import os
import re
...
match_dir = re.compile('(faint|bright.*)/(science|calib)(/chip)?')
for dirpath, dirnames, filenames in os.walk("/your/top/dir")
if match_dir.search(dirpath):
do_whatever_with_files(dirpath, files)
# OR
do_whatever_with_subdirs(dirpath, dirnames)
Upvotes: 6
Reputation: 123400
{..}
is known as brace expansion, and is a separate step applied before globbing takes place.
It's not part of globs, and not supported by the python glob function.
Upvotes: 10