code_warrior
code_warrior

Reputation: 89

building a dictionary from a string containing one or more tokens

given

import pyparsing as pp

lines = '''\
(xcoord -23899.747)
(ycoord 14349.544)
(elev 23899)
(region "mountainous")
(rate multiple)'''

leftParen    = pp.Literal('(')
rightParen   = pp.Literal(')')
doublequote  = pp.Literal('"')
v_string = pp.Word(pp.alphanums)
v_quoted_string = pp.Combine( doublequote + v_string + doublequote)
v_number = pp.Word(pp.nums+'.'+'-')

keyy = v_string
valu = v_string | v_quoted_string | v_number

item  = pp.Group( pp.Literal('(').suppress() + keyy + valu + pp.Literal(')').suppress() 
items = pp.ZeroOrMore( item)
dicct = pp.Dict( items)

pp.ParserElement.setDefaultWhitespaceChars('\r\n\t ')
print "item yields: " ,   item.parseString( lines).dump()
print "items yields: " , items.parseString( lines).dump()
print "dicct yields: ",  dicct.parseString( lines).dump()

gives

item yields: [['xcoord', '-23899.747']]
[0]:['xcoord', '-23899.747']
items yields: [['xcoord', '-23899.747']]
[0]:['xcoord', '-23899.747']
dicct yields: [['xcoord', '-23899.747']]
[0]:['xcoord', '-23899.747']

Hm. I'd expect to see five items within dicct. My use of Dict, ZeroOrMore and Group seem consistant with other examples on the net. It seems like only the first item gets matched. What am I doing wrong?

TIA,

code-warrior

Upvotes: 2

Views: 80

Answers (1)

Bill Bell
Bill Bell

Reputation: 21643

This is easier to do than you might think. (It just takes weeks of practice for some of us.)

  • v_number, to represent numeric values, and v_string to represent unquoted string values are fairly straightforward.
  • I've used Combine with quoted strings so that the quotation marks are included with the strings in the parsed results.
  • I've used Group with key and value so that these values are paired in the output from the parser.
  • ZeroOrMore is there to allow for any number of key-value pairs, including zero.

lines = '''\
(xcoord -23899.747)
(ycoord 14349.544)
(elev 23899)
(region "mountainous")
(rate multiple)'''


import pyparsing as pp
key = pp.Word(pp.alphas)
v_number = pp.Word(pp.nums+'.'+'-')
v_string = pp.Word(pp.alphas)
v_quoted_string = pp.Combine(pp.Literal('"') + v_string + pp.Literal('"') )
value = v_number | v_string | v_quoted_string 
item = pp.Literal('(').suppress() + pp.Group(key + value) + pp.Literal(')').suppress() 
collection = pp.ZeroOrMore(item)

result = {}
for item in collection.parseString(lines):
    result[item[0]] = item[1]

for key in result:
    print (key, result[key])

Output:

xcoord -23899.747
ycoord 14349.544
elev 23899
region "mountainous"
rate multiple

Upvotes: 1

Related Questions