`pyparsing`: iterating over `ParsedResults`

Question

I've just started using pyparsing this evening and I've built a complex grammar which describes some sources I'm working with very effectively. It was very easy and very powerful. However, I'm having some trouble working with ParsedResults. I need to be able to iterate over nested tokens in the order that they're found, and I'm finding it a little frustrating. I've abstracted my problem to a simple case:

import pyparsing as pp

word = pp.Word(pp.alphas + ',.')('word*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | direct_speech))('sentence')

test_string = 'Lorem ipsum “dolor sit” amet, consectetur.'

r = sentence.parseString(test_string)

print r.asXML('div')

print ''

for name, item in r.sentence.items():
    print name, item

print ''

for item in r.sentence:
    print item.getName(), item.asList()

as far as I can see, this ought to work? Here is the output:


  
    Lorem
    ipsum
    
      dolor
      sit
    
    amet,
    consectetur.
  


word ['Lorem', 'ipsum', 'amet,', 'consectetur.']
direct_speech [['dolor', 'sit']]

Traceback (most recent call last):
  File "./test.py", line 27, in 
    print item.getName(), item.asList()
AttributeError: 'str' object has no attribute 'getName'

The XML output seems to indicate that the string is parsed exactly as I would wish, but I can't iterate over the sentence, for example, to reconstruct it.

Is there a way to do what I need to?

Thanks!

edit:

I've been using this:

for item in r.sentence:
    if isinstance(item, basestring):
        print item
    else:
        print item.getName(), item

but it doesn't help me all that much, because I can't distinguish different types of string. Here is a slightly expanded example:

word = pp.Word(pp.alphas + ',.')('word*')
number = pp.Word(pp.nums + ',.')('number*')

direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word | number))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | number | direct_speech))('sentence')

test_string = 'Lorem 14 ipsum “dolor 22 sit” amet, consectetur.'

r = sentence.parseString(test_string)

for i, item in enumerate(r.sentence):
    if isinstance(item, basestring):
        print i, item
    else:
        print i, item.getName(), item

the output is:

0 Lorem
1 14
2 ipsum
3 word ['dolor', '22', 'sit']
4 amet,
5 consectetur.

not too helpful. I can't distinguish between word and number, and the direct_speech element is labelled word?!

I'm obviously missing something. All I want to do is:

for item in r.sentence:
    if (item is a number):
        do something
    elif (item is a word):
        do something else
etc. ...

should I be approaching this differently?

simon · Accepted Answer

well, I've tried a number of different approaches now and I can't get what I need, so (absurd though it seems), I'm using .asXML() and parsing the resulting XML. Here's my example:

import pyparsing as pp

word = pp.Word(pp.alphas + ',.')('word*')
number = pp.Word(pp.nums + ',.')('number*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word | number))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | number | direct_speech))('sentence')

test_string = 'Lorem 14 ipsum “dolor 22 sit” amet, consectetur.'
r = sentence.parseString(test_string)

from lxml import etree
xml = etree.fromstring(r.sentence.asXML('sentence'))
for el in xml:
    if len(el):
        print el.tag
        for sub_el in el:
            print '  ', sub_el.tag, ':', sub_el.text
    else:
        print el.tag, ':',  el.text

which outputs:

word : Lorem
number : 14
word : ipsum
direct_speech
   word : dolor
   number : 22
   word : sit
word : amet,
word : consectetur.

seems like a long way around the houses, but there doesn't seem to be a better way.

`pyparsing`: iterating over `ParsedResults`

Answers (2)

Related Questions