Mat
Mat

Reputation: 86474

Parameterised regular expression in Python

In Python, is there a better way to parameterise strings into regular expressions than doing it manually like this:

test = 'flobalob'
names = ['a', 'b', 'c']
for name in names:
    regexp = "%s" % (name)
    print regexp, re.search(regexp, test)

This noddy example tries to match each name in turn. I know there's better ways of doing that, but its a simple example purely to illustrate the point.


The answer appears to be no, there's no real alternative. The best way to paramaterise regular expressions in python is as above or with derivatives such as str.format(). I tried to write a generic question, rather than 'fix ma codez, kthxbye'. For those still interested, I've fleshed out an example closer to my needs here:

for diskfilename in os.listdir(''):
    filenames = ['bob.txt', 'fred.txt', 'paul.txt']
    for filename in filenames:
        name, ext = filename.split('.')
        regexp = "%s.*\.%s" % (name, ext)
        m = re.search(regexp, diskfilename)
        if m:
          print diskfilename, regexp, re.search(regexp, diskfilename)
          # ...

I'm trying to figure out the 'type' of a file based on its filename, of the form <filename>_<date>.<extension>. In my real code, the filenames array is a dict, containing a function to call once a match is found.

Other ways I've considered doing it:


Thanks for the responses suggesting alternatives to regular expressions to achieve the same end result. I was more interested in parameterising regular expressions for now and for the future. I never come across fnmatch, so its all useful in the long run.

Upvotes: 3

Views: 889

Answers (3)

jfs
jfs

Reputation: 414245

import fnmatch, os

filenames = ['bob.txt', 'fred.txt', 'paul.txt']

                  # 'b.txt.b' -> 'b.txt*.b'
filepatterns = ((f, '*'.join(os.path.splitext(f))) for f in filenames) 
diskfilenames = filter(os.path.isfile, os.listdir(''))
pattern2filenames = dict((fn, fnmatch.filter(diskfilenames, pat))
                         for fn, pat in filepatterns)

print pattern2filenames

Output:

{'bob.txt': ['bob20090217.txt'], 'paul.txt': [], 'fred.txt': []}

Answers to previous revisions of your question follow:


I don't understand your updated question but filename.startswith(prefix) might be sufficient in your specific case.

After you've updated your question the old answer below is less relevant.


  1. Use re.escape(name) if you'd like to match a name literally.

  2. Any tool available for string parametrization is applicable here. For example:

    import string
    print string.Template("$a $b").substitute(a=1, b="B")
    # 1 B
    

    Or using str.format() in Python 2.6+:

    print "{0.imag}".format(1j+2)
    # 1.0
    

Upvotes: 2

SilentGhost
SilentGhost

Reputation: 319601

may be glob and fnmatch modules can be of some help for you?

Upvotes: 2

paprika
paprika

Reputation: 2484

Well, as you build a regexp from a string, I see no other way. But you could parameterise the string itself with a dictionary:

d = {'bar': 'a', 'foo': 'b'}
regexp = '%(foo)s|%(bar)s' % d

Or, depending on the problem, you could use list comprehensions:

vlist = ['a', 'b', 'c']
regexp = '|'.join([s for s in vlist])

EDIT: Mat clarified his question, this makes things different and the above mentioned is totally irrelevant.

I'd probably go with an approach like this:

filename = 'bob_20090216.txt'

regexps = {'bob': 'bob_[0-9]+.txt',
           'fred': 'fred_[0-9]+.txt',
           'paul': 'paul_[0-9]+.txt'}

for filetype, regexp in regexps.items():
    m = re.match(regexp, filename)
    if m != None:
        print '%s is of type %s' % (filename, filetype)

Upvotes: 6

Related Questions