Celdor
Celdor

Reputation: 2597

Pyparsing: grouping results by keywords ending with colons

I am new to pyparsing. Although did try to read through the docs, I did not manage to solve the first problem of grouping expressions by "keyword: token(s)". I ended up with this code:

import pyparsing
from pprint import pprint

token = pp.Word(pp.alphas)
keyword = pp.Combine(token + pp.Literal(":"))
expr = pp.Group(keyword[1] + token[1, ...])

pprint(expr.parse_string("keyA: aaa bb ccc keyB: ddd eee keyC: xxx yyy hhh zzz").as_list())

It stops in the middle and parses the second keyword as a regular token. The result is following:

the expression:

keyA: aaa bb ccc keyB: ddd eee keyC: xxx yyy hhh zzz

gets parsed into:

[['keyA:', 'aaa', 'bb', 'ccc', 'keyB']]

I cannot figure out how to define keyword and token.


Edit.

In general, I'd like to parse the following expression:

keyword1: token11 token12 ... keyword2: token21 & token22 token23 keyword3: (token31 token32) & token33

into the following list:

[
    ["keyword1:", "token11", "token12", ...],
    ["keyword2:", ["token21", "&", "token22"], "token23"],
    ["keyword3:", [["token31", "token32"], "&", "token33"]],
]

Upvotes: 1

Views: 64

Answers (2)

PaulMcG
PaulMcG

Reputation: 63709

To add support for the '&' operator as in your original post, you were very close with your use of infixNotation. In your original, you had an expression like "a b & c", which you wanted to parse as ['a', ['b', '&', 'c']. The first issue you had was with the token vs. key issue, which you have resolved for yourself. The second issue has to do with your operators. It is possible in infixNotation to define an empty operator using a Python None for the operator expression. Since you have defined your expression to make this of lower precedence than '&', then you would define your expression as:

expr = infixNotation(token,
    [
        ('&', 2, opAssoc.LEFT,),
        (None, 2, opAssoc.LEFT,),
    ])

Use runTests to quickly run a bunch of tests:

expr.runTests("""\
    a b c
    a & b
    a & b & c
    a & b c
    a & (b c)
""", fullDump=False)

Prints:

a b c
[['a', 'b', 'c']]

a & b
[['a', '&', 'b']]

a & b & c
[['a', '&', 'b', '&', 'c']]

a & b c
[[['a', '&', 'b'], 'c']]

a & (b c)
[['a', '&', ['b', 'c']]]

Upvotes: 1

Celdor
Celdor

Reputation: 2597

OK, so I was looking for a way to specify that token ends with an alphanumeric character, that is not :. It turns out Pyparsing has a function WordEnd(), which I used, with which the expression is correctly parsed.

import pyparsing
from pprint import pprint

token = pp.Combine(pp.Word(pp.alphas) + pp.WordEnd())
keyword = pp.Combine(pp.Word(pp.alphas) + pp.Literal(":"))
expr = pp.Group(keyword[1] + token[1, ...])[1, ...]

pprint(expr.parse_string("keyA: aaa bb ccc keyB: ddd eee keyC: xxx yyy hhh zzz").as_list())
[['keyA:', 'aaa', 'bb', 'ccc'],
 ['keyB:', 'ddd', 'eee'],
 ['keyC:', 'xxx', 'yyy', 'hhh', 'zzz']]

Upvotes: 1

Related Questions