Reputation: 6393
I have to parse an input string in python and extract certain parts from it.
the format of the string is
(xx,yyy,(aa,bb,...)) // Inner parenthesis can hold one or more characters in it
I want a function to return xx, yyyy and a list containing aa, bb ... etc
I can ofcourse do it by trying to split of the parenthesis and stuff but I want to know if there a proper pythonic way of extracting such info from a string
I have this code which works, but is there a better way to do it (without regex)
def processInput(inputStr):
value = inputStr.strip()[1:-1]
parts = value.split(',', 2)
return parts[0], parts[1], (parts[2].strip()[1:-1]).split(',')
Upvotes: 1
Views: 2927
Reputation: 1965
Your solution is decent (simple, efficient). You could use regular expressions to restrict the syntax if you don't trust your data source.
import re
parser_re = re.compile(r'\(([^,)]+),([^,)]+),\(([^)]+)\)')
def parse(input):
m = parser_re.match(input)
if m:
first = m.group(1)
second = m.group(2)
rest = m.group(3).split(",")
return (first, second, rest)
else:
return None
print parse( '(xx,yy,(aa,bb,cc,dd))' )
print parse( 'xx,yy,(aa,bb,cc,dd)' ) # doesn't parse, returns None
# can use this to unpack the various parts.
# first,second,rest = parse(...)
Prints:
('xx', 'yy', ['aa', 'bb', 'cc', 'dd'])
None
Upvotes: 0
Reputation: 882631
If you're allergic to REs, you could use pyparsing:
>>> import pyparsing as p
>>> ope, clo, com = map(p.Suppress, '(),')
>>> w = p.Word(p.alphas)
>>> s = ope + w + com + w + com + ope + p.delimitedList(w) + clo + clo
>>> x = '(xx,yyy,(aa,bb,cc))'
>>> list(s.parseString(x))
['xx', 'yyy', 'aa', 'bb', 'cc']
pyparsing
also makes it easy to control the exact form of results (e.g. by grouping the last 3 items into their own sublist), if you want. But I think the nicest aspect is how natural (depending on how much space you want to devote to it) you can make the "grammar specification" read: an open paren, a word, a comma, a word, a comma, an open paren, a delimited list of words, two closed parentheses (if you find the assignment to s
above not so easy to read, I guess it's my fault for not choosing longer identifiers;-).
Upvotes: 3
Reputation: 63772
If your parenthesis nesting can be arbitrarily deep, then regexen won't do, you'll need a state machine or a parser. Pyparsing supports recursive grammars using forward-declaration class Forward:
from pyparsing import *
LPAR,RPAR,COMMA = map(Suppress,"(),")
nestedParens = Forward()
listword = Word(alphas) | '...'
nestedParens << Group(LPAR + delimitedList(listword | nestedParens) + RPAR)
text = "(xx,yyy,(aa,bb,...))"
results = nestedParens.parseString(text).asList()
print results
text = "(xx,yyy,(aa,bb,(dd,ee),ff,...))"
results = nestedParens.parseString(text).asList()
print results
Prints:
[['xx', 'yyy', ['aa', 'bb', '...']]]
[['xx', 'yyy', ['aa', 'bb', ['dd', 'ee'], 'ff', '...']]]
Upvotes: 3
Reputation: 123897
How about like this?
>>> import ast
>>> import re
>>>
>>> s="(xx,yyy,(aa,bb,ccc))"
>>> x=re.sub("(\w+)",'"\\1"',s)
# '("xx","yyy",("aa","bb","ccc"))'
>>> ast.literal_eval(x)
('xx', 'yyy', ('aa', 'bb', 'ccc'))
>>>
Upvotes: 2
Reputation: 12806
I don't know that this is better, but it's a different way to do it. Using the regex previously suggested
def processInput(inputStr):
value = [re.sub('\(*\)*','',i) for i in inputStr.split(',')]
return value[0], value[1], value[2:]
Alternatively, you could use two chained replace functions in lieu of the regex.
Upvotes: 1
Reputation: 60033
Let's use regular expressions!
/\(([^,]+),([^,]+),\(([^)]+)\)\)/
Match against that, first capturing group contains xx, second contains yyy, split the third on ,
and you have your list.
Upvotes: 2