pythonic metaphor
pythonic metaphor

Reputation: 10556

Python regex matching in conditionals

I am parsing file and I want to check each line against a few complicated regexs. Something like this

if re.match(regex1, line): do stuff
elif re.match(regex2, line): do other stuff
elif re.match(regex3, line): do still more stuff
...

Of course, to do the stuff, I need the match objects. I can only think of three possibilities, each of which leaves something to be desired.

if re.match(regex1, line): 
    m = re.match(regex1, line)
    do stuff
elif re.match(regex2, line):
    m = re.match(regex2, line)
    do other stuff
...

which requires doing the complicated matching twice (these are long files and long regex :/)

m = re.match(regex1, line)
if m: do stuff
else:
    m = re.match(regex2, line)
    if m: do other stuff
    else:
       ...

which gets terrible as I indent further and further.

while True:
    m = re.match(regex1, line)
    if m:
        do stuff
        break
    m = re.match(regex2, line)
    if m:
        do other stuff
        break
    ...

which just looks weird.

What's the right way to do this?

Upvotes: 26

Views: 22846

Answers (12)

Sam Hanes
Sam Hanes

Reputation: 2849

You can define a class wrapping the match object with a call method to perform the match:

class ReMatcher(object):
    match = None

    def __call__(self, pattern, string):
        self.match = re.match(pattern, string)
        return self.match

    def __getattr__(self, name):
        return getattr(self.match, name)

Then call it in your conditions and use it as if it was a match object in the resulting blocks:

match = ReMatcher()

if match(regex1, line):
    print(match.group(1))

elif match(regex2, line):
    print(match.group(1))

This should work in nearly any Python version, with slight adjustments in versions before new-style classes. As in my other answer, you should use re.compile if you're concerned about regex performance.

Upvotes: 0

Sam Hanes
Sam Hanes

Reputation: 2849

You can define a local function that accepts a regex, tests it against your input, and stores the result to a closure-scoped variable:

match = None

def matches(pattern):
    nonlocal match, line
    match = re.match(pattern, line)
    return match

if matches(regex1):
    # do stuff with `match`

elif matches(regex2):
    # do other stuff with `match`

I'm not sure how Pythonic that approach is, but it's the cleanest way I've found to do regex matching in an if-elif-else chain and preserve the match objects.

Note that this approach will only work in Python 3.0+ as it requires the PEP 3104 nonlocal statement. In earlier Python versions there's no clean way for a function to assign to a variable in a non-global parent scope.

It's also worth noting that if you have a big enough file that you're worried about running a regex twice for each line you should also be pre-compiling them with re.compile and passing the resulting regex object to your check function instead of the raw string.

Upvotes: 1

Uri Granta
Uri Granta

Reputation: 1904

Your last suggestion is slightly more Pythonic when wrapped up in a function:

def parse_line():
    m = re.match(regex1, line)
    if m:
        do stuff
        return
    m = re.match(regex2, line)
    if m:
        do other stuff
        return
    ...

That said, you can get closer to what you want using a simple container class with some operator overloading class:

class ValueCache():
    """A simple container with a returning assignment operator."""
    def __init__(self, value=None):
        self.value = value
    def __repr__(self):
        return "ValueCache({})".format(self.value)
    def set(self, value):
        self.value = value
        return value
    def __call__(self):
        return self.value
    def __lshift__(self, value):
        return self.set(value)
    def __rrshift__(self, value):
        return self.set(value)

match = ValueCache()
if (match << re.match(regex1, line)):
    do stuff with match()
elif (match << re.match(regex2, line)):
    do other stuff with match()

Upvotes: 1

Peter Lada
Peter Lada

Reputation: 393

Make a class with the match as state. Instantiate it before conditional, this should store the string that you are matching against as well.

Upvotes: 0

Alan
Alan

Reputation: 57

In this particular case there appears to be no convenient way to do this in python. if python would accept the syntax:

if (m = re.match(pattern,string)):
    text = m.group(1)

then all would be fine, but apparently you cannot do that

Upvotes: 4

eyquem
eyquem

Reputation: 27585

My solution with an exemple; there is only one re.search() that is performed:

text = '''\
koala + image @ wolf - snow
Good evening, ladies and gentlemen
An uninteresting line
There were 152 ravens on a branch
sea mountain sun ocean ice hot desert river'''

import re
regx3 = re.compile('hot[ \t]+([^ ]+)')
regx2 = re.compile('(\d+|ev.+?ng)')
regx1 = re.compile('([%~#`\@+=\d]+)')
regx  = re.compile('|'.join((regx3.pattern,regx2.pattern,regx1.pattern)))

def one_func(line):
    print 'I am one_func on : '+line

def other_func(line):
    print 'I am other_func on : '+line

def another_func(line):
    print 'I am another_func on : '+line

tupl_funcs = (one_func, other_func, another_func) 


for line in text.splitlines():
    print line
    m = regx.search(line)
    if m:
        print 'm.groups() : ',m.groups()
        group_number = (i for i,m in enumerate(m.groups()) if m).next()
        print "group_number : ",group_number
        tupl_funcs[group_number](line)
    else:
        print 'No match'
        print 'No treatment'
    print

result

koala + image @ wolf - snow
m.groups() :  (None, None, '+')
group_number :  2
I am another_func on : koala + image @ wolf - snow

Good evening, ladies and gentlemen
m.groups() :  (None, 'evening', None)
group_number :  1
I am other_func on : Good evening, ladies and gentlemen

An uninteresting line
No match
No treatment

There were 152 ravens on a branch
m.groups() :  (None, '152', None)
group_number :  1
I am other_func on : There were 152 ravens on a branch

sea mountain sun ocean ice hot desert river
m.groups() :  ('desert', None, None)
group_number :  0
I am one_func on : sea mountain sun ocean ice hot desert river

Upvotes: 0

Mu Mind
Mu Mind

Reputation: 11214

FWIW, I've stressed over the same thing, and I usually settle for the 2nd form (nested elses) or some variation. I don't think you'll find anything much better in general, if you're looking to optimize readability (many of these answers seem significantly less readable than your candidates to me).

Sometimes if you're in an outer loop or a short function, you can use a variation of your 3rd form (the one with break statements) where you either continue or return, and that's readable enough, but I definitely wouldn't create a while True block just to avoid the "ugliness" of the other candidates.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336488

You could define a function for the action required by each regex and do something like

def dostuff():
    stuff

def dootherstuff():
    otherstuff

def doevenmorestuff():
    evenmorestuff

actions = ((regex1, dostuff), (regex2, dootherstuff), (regex3, doevenmorestuff))

for regex, action in actions:
    m = re.match(regex, line)
    if m: 
        action()
        break

Upvotes: 16

v_krishna
v_krishna

Reputation: 206

First off, do you really need to use regexps for your matching? Where I would use regexps in, e.g., perl, I'll often use string functions in python (find, startswith, etc).

If you really need to use regexps, you can make a simple search function that does the search, and if the match is returned, sets a store object to keep your match around before returning True.

e.g.,

def search(pattern, s, store):
    match = re.search(pattern, s)
    store.match = match
    return match is not None

class MatchStore(object):
    pass   # irrelevant, any object with a 'match' attr would do

where = MatchStore()
if search(pattern1, s, where):
    pattern1 matched, matchobj in where.match
elif search(pattern2, s, where):
    pattern2 matched, matchobj in where.match
...

Upvotes: 3

C&#233;dric Julien
C&#233;dric Julien

Reputation: 80851

Why not use a dictionnary/switch statement ?

def action1(stuff):
    do the stuff 1
def action2(stuff):
    do the stuff 2

regex_action_dict = {regex1 : action1, regex2 : action2}
for regex, action in regex_action_dict.iteritems():
    match_object = re.match(regex, line):
    if match_object:
        action(match_object, line)

Upvotes: 0

Dan Breen
Dan Breen

Reputation: 12934

for patt in (regex1, regex2, regex3):
    match = patt.match(line)
    if match:
        if patt == regex1:
            # some handling
        elif patt == regex2:
            # more
        elif patt == regex3:
            # more
        break

I like Tim's answer because it separates out the per-regex matching code to keep things simple. For my answer, I wouldn't put more than a line or two of code for each match, and if you need more, call a separate method.

Upvotes: 4

Joshua Smith
Joshua Smith

Reputation: 6651

I would break your regex up into smaller components and search for simple first with longer matches later.

something like:

if re.match(simplepart,line):
      if re.match(complexregex, line):
          do stuff
elif re.match(othersimple, line):
      if re.match(complexother, line):
          do other stuff

Upvotes: 0

Related Questions