Parsing names with pyparsing

Question

I have a file of names and ages,

john 25 
bob 30 
john bob 35

Here is what I have so far

from pyparsing import *

data = '''
    john 25 
    bob 30 
    john bob 35
'''

name = Word(alphas + Optional(' ') + alphas)

rowData = Group(name +
                Suppress(White(" ")) +
                Word(nums))

table = ZeroOrMore(rowData)

print table.parseString(data)

the output I am expecting is

[['john', 25], ['bob', 30], ['john bob', 35]]

Here is the stacktrace

Traceback (most recent call last):
  File "C:\Users\mccauley\Desktop\client.py", line 11, in 
    eventType = Word(alphas + Optional(' ') + alphas)
  File "C:\Python27\lib\site-packages\pyparsing.py", line 1657, in __init__
    self.name = _ustr(self)
  File "C:\Python27\lib\site-packages\pyparsing.py", line 122, in _ustr
    return str(obj)
  File "C:\Python27\lib\site-packages\pyparsing.py", line 1743, in __str__
    self.strRepr = "W:(%s)" % charsAsStr(self.initCharsOrig)
  File "C:\Python27\lib\site-packages\pyparsing.py", line 1735, in charsAsStr
    if len(s)>4:
TypeError: object of type 'And' has no len()

vicvicvic · Accepted Answer

pyparsing automatically gets rid of whitespace so that you can write cleaner grammars. So, your name parser should be something more like:

# Parse for a name with an optional surname
# Note that pyparsing is built to accept "john doe" or "john        doe"
name = Word(alphas) + Optional(Word(alphas))

And then, the row parser:

# Parses a row of a name and an age
row = Group(name) + Word(nums)

You'll get a rather complicated structure, though, ([(['john', 'doe'], {}), '25'], {}) for each row, but I hope you can see how to work with this. I'd recommend not really using pyparsing to parse the whole string, but parse it line-by-line iteratively, if your data is line based. Makes stuff simpler, I think:

for line in input_string.splitlines():
    results = row.parseString(line)
    # Do something with results...

Parsing names with pyparsing

Answers (2)

Related Questions