Reputation: 2115
Given a set of strings like this: 60=60
, 100=60+30+10
, 200=120+50+30
, 300=200+100
, 180=60+50+40+20+10
I need a regex to parse (and validate) these strings. The match should be strict (e.g., no spaces allowed between numbers and operators).
I ended up with a regex like (\d+)=(\d+)(?:\+(\d+))*
It matches them all perfectly, but extracting matches with re.match(regex, string).groups()
returns ('100', '60', '10')
, ('200', '120', '30')
, ...
See, *
quantifier matched only the last number! That's expected, but doesn't solve my problem.
What is the most pythonic way to return all matches under *
quantifier separately? So I could easelly assert res[0] == sum(res[1:])
Currently, I match each bit independently, storing last match position and continue parsing from that position, but it looks a bit ugly.
Upvotes: 1
Views: 102
Reputation: 9008
Try python builtin function eval
to evaluate an expression at run-time. I have changed regex to little bit. It's general purpose and can be easily adopted to any mathematical operation.
import re
data = "100=60+30+10, 200=120+50+30, 300=200+100, 180=60+50+40+20+10"
rx = r"(\d+)=([^, ]+)"
for res in re.finditer(rx, data, re.IGNORECASE | re.MULTILINE):
lhs = eval(res.group(1))
rhs = eval(res.group(2))
assert lhs == rhs
And if you want some fun with code snippet, replace regex with:
rx = r"([+-]?\d+(?:\.\d+))=([^, ]+)"
Now you can evaluate positive, negative, integer and decimal numbers too.
Upvotes: 0
Reputation: 4504
It seems easy to solve using regex (Python 2.7):
>>> import re
>>> strs = '60=60, 100=60+30+10, 200=120+50+30, 300=200+100, 180=60+50+40+20+10'
>>> pattern = '((?:\d+)(?:|\+)|(?=|\+)(?:\d+))'
>>> [re.findall(pattern, str) for str in strs.split(',')]
[['60', '60'], ['100', '60', '30', '10'], ['200', '120', '50', '30'], ['300', '200', '100'], ['180', '60', '50', '40', '20', '10']]
Upvotes: 0
Reputation: 24802
And what about not using regex, but use a parser instead?
from pyparsing import *
integer = Word(nums).setParseAction(lambda t:int(t[0]))
variable = Word(alphas,exact=1)
operand = integer | variable
expop = Literal('^')
signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')
factop = Literal('!')
equalop = Literal('=')
expr = operatorPrecedence( operand,
[("=", 2, opAssoc.LEFT),
("+", 2, opAssoc.RIGHT),]
)
test=['60=60', '70=10+20', '100=1+2+42+67']
for t in test:
print t, u'→', expr.parseString(t)
print
which would then output:
60=60 → [[60, '=', 60]]
70=10+20 → [[[70, '=', 10], '+', 20]]
100=1+2+42+67 → [[[100, '=', 1], '+', [2, '+', [42, '+', 67]]]]
Then to get the integers, you'd only have to flatten the tree, and lookup all integers.
Another way, which I find slightly less elegant and does not do syntax checking of the string, would be to split the string on +
and =
:
for t in test:
head, tail = t.split('=')
values = [head] + tail.split('+')
print t, u'→', values
which gives:
60=60 → ['60', '60']
70=10+20 → ['70', '10', '20']
100=1+2+42+67 → ['100', '1', '2', '42', '67']
Finally, we could try to find a regex magic bullet to answer your question, but honestly, that wouldn't be the way I'd solve this.
N.B.: to flatten a list, here's a way:
def flatten(seq):
res = []
for item in seq:
if (isinstance(item, (tuple, list))):
res.extend(flatten(item))
else:
res.append(item)
return res
Upvotes: 3
Reputation: 20346
If +
is the only operator that you can get, (as I assume from the fact that you mentioned sum()
), you need no regex. Just use regular .split()
:
total, expression = string.split("=")
assert int(total.strip()) == sum(int(x.strip()) for x in expression.split("+"))
Upvotes: 1