Reputation: 10384
This one is a bit lengthly to explain, so bear wit me: With pyparsing
I have to analyse many text parts like:
first multi segment part 123 45 67890 third multi segment part
------------^----------- -----^------ ------------^-----------
Part A: alpha words B: num words Part C: alpha words
I tried to use pp.OneOrMore
for each part:
a = pp.OneOrMore(pp.Word(pp.alphas)).setName("PART_A")('A')
b = pp.OneOrMore(pp.Word(pp.nums)).setName("PART_B")('B')
c = pp.OneOrMore(pp.Word(pp.alphas)).setName("PART_C")('C')
expr = a + b + c
When I run this over the string "first multi segment part 123 45 67890 third multi segment part"
I get
- A: ['first', 'multi', 'segment', 'part']
- B: ['123', '45', '67890']
- C: ['third', 'multi', 'segment', 'part']
However I want all results flattened like:
- A: 'first multi segment part'
- B: '123 45 67890'
- C: 'third multi segment part'
For this I can use the setParseAction
function. becasue I will have a lot of constructs using this feature I extended the OneOrMore
class likes this:
class OneOrMoreJoined(pp.OneOrMore):
"""OneOrMore with results joined to one string"""
def __init__( self, expr, stopOn=None, joinString=' '):
super(OneOrMoreJoined,self).__init__(expr, stopOn=stopOn)
self.setParseAction(joinString.join)
With this class I get the desired result. :-)
However what can I do if I want a Sequence d1 + d2
to be joined?:
d1 = pp.Word(pp.nums).setName("PART_D1")
d2 = pp.Word(pp.alphas).setName("PART_D2")
expr = (d1 + d2)('D')
Of course I an create a new class AndJoined
and use AndJoined(d1,d2)
, but then I loose the nice notation d1 + d2
.
Is there a general way to flatten results? I could of course flatten the ParseResult manually outside after I retrieve the dict, but I suspect there is an easy way to express this inside pyparsing
...
Upvotes: 3
Views: 315
Reputation: 63709
The simplest would be to write a small helper like this:
joiner = lambda expr: expr.addParseAction(' '.join)
Then insert joiner
in your grammar wherever:
a_b_c = joiner(a + b + c | d + Optional(e))
Just make sure that the tokens passed to joiner
are just single level tokens. If they are nested, then you might need a flattener routine, but this is easily added by writing joiner
as:
joiner = lambda expr: expr.addParseAction(flatten, ' '.join)
Upvotes: 1