halloleo
halloleo

Reputation: 10384

Generally flatten returned lists in pyparsing

This one is a bit lengthly to explain, so bear wit me: With pyparsing I have to analyse many text parts like:

first multi segment part 123 45 67890 third multi segment part

------------^----------- -----^------ ------------^-----------
  Part A: alpha words    B: num words   Part C: alpha words

I tried to use pp.OneOrMore for each part:

a = pp.OneOrMore(pp.Word(pp.alphas)).setName("PART_A")('A')
b = pp.OneOrMore(pp.Word(pp.nums)).setName("PART_B")('B')
c = pp.OneOrMore(pp.Word(pp.alphas)).setName("PART_C")('C')
expr = a + b + c

When I run this over the string "first multi segment part 123 45 67890 third multi segment part" I get

- A: ['first', 'multi', 'segment', 'part']
- B: ['123', '45', '67890']
- C: ['third', 'multi', 'segment', 'part']

However I want all results flattened like:

- A: 'first multi segment part'
- B: '123 45 67890'
- C: 'third multi segment part'

For this I can use the setParseAction function. becasue I will have a lot of constructs using this feature I extended the OneOrMore class likes this:

class OneOrMoreJoined(pp.OneOrMore):
    """OneOrMore with results joined to one string"""
    def __init__( self, expr, stopOn=None, joinString=' '):
        super(OneOrMoreJoined,self).__init__(expr, stopOn=stopOn)
        self.setParseAction(joinString.join)

With this class I get the desired result. :-)

However what can I do if I want a Sequence d1 + d2 to be joined?:

d1 = pp.Word(pp.nums).setName("PART_D1")
d2 = pp.Word(pp.alphas).setName("PART_D2")
expr = (d1 + d2)('D')

Of course I an create a new class AndJoined and use AndJoined(d1,d2), but then I loose the nice notation d1 + d2.

Is there a general way to flatten results? I could of course flatten the ParseResult manually outside after I retrieve the dict, but I suspect there is an easy way to express this inside pyparsing...

Upvotes: 3

Views: 315

Answers (1)

PaulMcG
PaulMcG

Reputation: 63709

The simplest would be to write a small helper like this:

joiner = lambda expr: expr.addParseAction(' '.join)

Then insert joiner in your grammar wherever:

a_b_c = joiner(a + b + c | d + Optional(e))

Just make sure that the tokens passed to joiner are just single level tokens. If they are nested, then you might need a flattener routine, but this is easily added by writing joiner as:

joiner = lambda expr: expr.addParseAction(flatten, ' '.join)

Upvotes: 1

Related Questions