AC101
AC101

Reputation: 938

Glob pattern to match all files that do not start with prefix

I want to use glob pattern to match all the files in src/ folder that do not start with prefix of two uppercase letters followed by a dot.

The glob should match the following files

foo.txt
foo-bar.txt
foo.bar.baz.txt
fo.txt

But it should not match the following files:

AB.foo.txt
AB.foo-bar.txt
XY.foo.bar.baz.txt
FO.fo.txt

The prefix will always be two uppercase letters (A to Z) followed by a dot.

Upvotes: 1

Views: 3053

Answers (4)

ncohen
ncohen

Reputation: 473

If you take a look in glob.py, you can see it's using fnmatch.filter to filter the paths. fnmatch.filter uses fnmatch.translate to form a regex from the pattern. Therefore, glob.glob("[!A-Z][!A-Z]*") can be used (will be translated to the following regex: '(?s:[^A-Z][^A-Z].*)\\Z'.

Note that this will ignore anything that contains an uppercase letter in the first two indexes. The function translate is defined like this:

def translate(pat):
"""Translate a shell PATTERN to a regular expression.

There is no way to quote meta-characters.
"""

so i believe there is no way to include a more complicated regex expression.

Upvotes: 0

Chumicat
Chumicat

Reputation: 302

This is a solution also use listdir but use regular expression to fetch what we need.

from os import listdir
import re

# Fetch any file not match Captial head kind. If non-text file exist in the directory, it might fetch it either.
file_list = [f for f in listdir('./src') if not re.match(r'[A-Z][A-Z]\..*', f)]

# Fetch any file not match Captial head kind. Fetch txt only
file_list = [f for f in listdir('./src') if not re.match(r'[A-Z][A-Z]\..*', f) and re.match(r'.*txt', f)] 

Upvotes: 0

sj95126
sj95126

Reputation: 6908

glob() can mostly do what you're looking for, but with some limitations.

You can do this:

glob.glob("src/[!A-Z][!A-Z][!.]*")

which would exclude any files that start with two uppercase letter followed by a dot. However, this particular syntax will also exclude any files with less than 3 characters in the filename. Globbing is similar to shell filename globbing syntax, and in a shell what you're looking for is more often accomplished with find or grep.

If glob() isn't flexible enough, you'd have to glob all files and pattern match on your own.

Upvotes: 2

balderman
balderman

Reputation: 23815

How about the below (using listdir)

from os import listdir

file_list = [f for f in listdir('./src') if not (f[0].isupper() and f[1].isupper() and f[2] == '.')]

Upvotes: 0

Related Questions