n179911
n179911

Reputation: 20301

Need help in refactoring my python script

I have a python script which process a file line by line, if the line matches a regex, it calls a function to handle it.

My question is is there a better write to refactor my script. The script works, but as it is, i need to keep indent to the right of the editor as I add more and more regex for my file.

Thank you for any idea. Now my code end up like this:

for line in fi.readlines():

       result= reg1.match(line)

       if result:
               handleReg1(result)

       else:
               result = reg2.match(line)

               if result:
                       handleReg2(result)
               else:
                       result = reg3.match(line)

                       if result:
                               handleReg3(result)
                       else:
                               result = reg4.match(line)

                               if result:
                                       handleReg4(result)
                               else:
                                       result = reg5.match(line)

                                       if result:
                                              handleReg5(result)

Upvotes: 3

Views: 226

Answers (3)

ilya n.
ilya n.

Reputation: 18816

Here's a trivial one:

handlers = { reg1 : handleReg1, ... }

for line in fi.readlines():
    for h in handlers:
        x = h.match(line)
        if x:
            handlers[h](x)

If there could be a line that matches several regexps this code will be different from the code you pasted: it will call several handlers. Adding break won't help, because the regexps will be tried in a different order, so you'll end up calling the wrong one. So if this is the case you should iterate over list:

handlers = [ (reg1, handleReg1), (reg2, handleReg2), ... ]

for line in fi.readlines():
    for reg, handler in handlers:
        x = reg.match(line)
        if x:
            handler(x)
            break

Upvotes: 1

Nelson
Nelson

Reputation: 29706

An alternate approach that might work for you is to combine all the regexps into one giant regexp and use m.group() to detect which matched. My intuition says this should be faster, but I haven't tested it.

>>> reg = re.compile('(cat)|(dog)|(apple)')
>>> m = reg.search('we like dogs')
>>> print m.group()
dog
>>> print m.groups()
(None, 'dog', None)

This gets complicated if the regexps you're testing against are themselves complicated or use match groups.

Upvotes: 0

samtregar
samtregar

Reputation: 7148

I'd switch to using a data structure mapping regexes to functions. Something like:

map = { reg1: handleReg1, reg2: handleReg2, etc }

Then you just loop through them:

for reg, handler in map.items():
    result = reg.match(line)
    if result:
       handler(result)
       break

If you need the matches to happen in a particular order you'll need to use a list instead of a dictionary, but the principal is the same.

Upvotes: 12

Related Questions