dmzkrsk
dmzkrsk

Reputation: 2115

Python: Capture and return each element in quantifier

Given a set of strings like this: 60=60, 100=60+30+10, 200=120+50+30, 300=200+100, 180=60+50+40+20+10 I need a regex to parse (and validate) these strings. The match should be strict (e.g., no spaces allowed between numbers and operators).

I ended up with a regex like (\d+)=(\d+)(?:\+(\d+))*

It matches them all perfectly, but extracting matches with re.match(regex, string).groups() returns ('100', '60', '10'), ('200', '120', '30'), ...

See, * quantifier matched only the last number! That's expected, but doesn't solve my problem.

What is the most pythonic way to return all matches under * quantifier separately? So I could easelly assert res[0] == sum(res[1:])

Currently, I match each bit independently, storing last match position and continue parsing from that position, but it looks a bit ugly.

Upvotes: 1

Views: 102

Answers (4)

Saleem
Saleem

Reputation: 9008

Try python builtin function eval to evaluate an expression at run-time. I have changed regex to little bit. It's general purpose and can be easily adopted to any mathematical operation.

import re

data = "100=60+30+10, 200=120+50+30, 300=200+100, 180=60+50+40+20+10"

rx = r"(\d+)=([^, ]+)"

for res in re.finditer(rx, data, re.IGNORECASE | re.MULTILINE):
    lhs = eval(res.group(1))
    rhs = eval(res.group(2))
    assert lhs == rhs

And if you want some fun with code snippet, replace regex with:

rx = r"([+-]?\d+(?:\.\d+))=([^, ]+)"

Now you can evaluate positive, negative, integer and decimal numbers too.

Upvotes: 0

Quinn
Quinn

Reputation: 4504

It seems easy to solve using regex (Python 2.7):

>>> import re
>>> strs = '60=60, 100=60+30+10, 200=120+50+30, 300=200+100, 180=60+50+40+20+10'
>>> pattern = '((?:\d+)(?:|\+)|(?=|\+)(?:\d+))'
>>> [re.findall(pattern, str) for str in strs.split(',')]
[['60', '60'], ['100', '60', '30', '10'], ['200', '120', '50', '30'], ['300', '200', '100'], ['180', '60', '50', '40', '20', '10']]

REGEX DEMO.

Upvotes: 0

zmo
zmo

Reputation: 24802

And what about not using regex, but use a parser instead?

from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
variable = Word(alphas,exact=1)
operand = integer | variable

expop = Literal('^')
signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')
factop = Literal('!')
equalop = Literal('=')

expr = operatorPrecedence( operand,
    [("=", 2, opAssoc.LEFT),
     ("+", 2, opAssoc.RIGHT),]
    )


test=['60=60', '70=10+20', '100=1+2+42+67']

for t in test:
    print t, u'→', expr.parseString(t)
    print

which would then output:

60=60 → [[60, '=', 60]]

70=10+20 → [[[70, '=', 10], '+', 20]]

100=1+2+42+67 → [[[100, '=', 1], '+', [2, '+', [42, '+', 67]]]]

Then to get the integers, you'd only have to flatten the tree, and lookup all integers.


Another way, which I find slightly less elegant and does not do syntax checking of the string, would be to split the string on + and =:

for t in test:
    head, tail = t.split('=')
    values = [head] + tail.split('+')
    print t, u'→', values

which gives:

60=60 → ['60', '60']
70=10+20 → ['70', '10', '20']
100=1+2+42+67 → ['100', '1', '2', '42', '67']

Finally, we could try to find a regex magic bullet to answer your question, but honestly, that wouldn't be the way I'd solve this.


N.B.: to flatten a list, here's a way:

def flatten(seq):
    res = []
    for item in seq:
        if (isinstance(item, (tuple, list))):
            res.extend(flatten(item))
        else:
            res.append(item)
    return res

Upvotes: 3

zondo
zondo

Reputation: 20346

If + is the only operator that you can get, (as I assume from the fact that you mentioned sum()), you need no regex. Just use regular .split():

total, expression = string.split("=")
assert int(total.strip()) == sum(int(x.strip()) for x in expression.split("+"))

Upvotes: 1

Related Questions