Reputation: 911
I need to parse a line like these:
foo, bar > 1.0, baz = 2.0
foo bar > 1.0 baz = 2.0
foo, bar, baz
foo bar baz
for each element it can be $string (>|<|<=|>=|=) $num or just $string, separator ',' is optional between the elements.
in all these cases, recognize them as:
['foo', 'bar', 'baz']
how could I do this in python?
Upvotes: 1
Views: 199
Reputation: 43437
You can just extract all the letter groups:
s = """
foo, bar > 1.0, baz = 2.0
foo bar > 1.0 baz = 2.0
foo, bar, baz
foo bar baz
"""
import re
regex = re.compile(r'([a-z]+)', re.I) # re.I (ignore case flag)
for line in s.splitlines():
if not line:
continue # skip empty lines
print regex.findall(line)
>>>
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
Upvotes: 2
Reputation: 97918
This one checks for the syntax also:
import re
with open("input") as f:
for line in f:
line = line.strip()
# chop a line into expressions of the form: str [OP NUMBER]
exprs = re.split(r'(\w+\s*(?:[!<>=]=?\s*[\d.]*)?\s*,?\s*)', line)
for expr in exprs:
# chop each expression into tokens and get the str part
tokens = re.findall(r'(\w+)\s*(?:[!<>=]=?\s*[\d.]*)?,?', expr)
if tokens: print tokens
Upvotes: 0