vts
vts

Reputation: 911

Python string split using regex

I need to parse a line like these:

foo, bar > 1.0, baz = 2.0
foo  bar > 1.0  baz = 2.0
foo, bar, baz
foo  bar  baz

for each element it can be $string (>|<|<=|>=|=) $num or just $string, separator ',' is optional between the elements.

in all these cases, recognize them as:

['foo', 'bar', 'baz']

how could I do this in python?

Upvotes: 1

Views: 199

Answers (3)

Inbar Rose
Inbar Rose

Reputation: 43437

You can just extract all the letter groups:

s = """
foo, bar > 1.0, baz = 2.0
foo  bar > 1.0  baz = 2.0
foo, bar, baz
foo  bar  baz
"""

import re
regex = re.compile(r'([a-z]+)', re.I)  # re.I (ignore case flag)

for line in s.splitlines():
    if not line:
        continue # skip empty lines

    print regex.findall(line)

>>> 
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']

Upvotes: 2

perreal
perreal

Reputation: 97918

This one checks for the syntax also:

import re
with open("input") as f:
    for line in f:
        line = line.strip()
        # chop a line into expressions of the form: str [OP NUMBER]
        exprs = re.split(r'(\w+\s*(?:[!<>=]=?\s*[\d.]*)?\s*,?\s*)', line)
        for expr in exprs:
            # chop each expression into tokens and get the str part
            tokens = re.findall(r'(\w+)\s*(?:[!<>=]=?\s*[\d.]*)?,?', expr)
            if tokens: print tokens

Upvotes: 0

Anirudha
Anirudha

Reputation: 32787

You can split at every non alphabetic characters

re.split("[^a-zA-Z]+",input)

Though am assuming that your $string contain only alphabets..


You can remove empty results with filter

filter(None, str_list)

Upvotes: 3

Related Questions