Reputation: 2597
I am new to pyparsing
. Although did try to read through the docs, I did not manage to solve the first problem of grouping expressions by "keyword: token(s)". I ended up with this code:
import pyparsing
from pprint import pprint
token = pp.Word(pp.alphas)
keyword = pp.Combine(token + pp.Literal(":"))
expr = pp.Group(keyword[1] + token[1, ...])
pprint(expr.parse_string("keyA: aaa bb ccc keyB: ddd eee keyC: xxx yyy hhh zzz").as_list())
It stops in the middle and parses the second keyword
as a regular token. The result is following:
the expression:
keyA: aaa bb ccc keyB: ddd eee keyC: xxx yyy hhh zzz
gets parsed into:
[['keyA:', 'aaa', 'bb', 'ccc', 'keyB']]
I cannot figure out how to define keyword
and token
.
Edit.
In general, I'd like to parse the following expression:
keyword1: token11 token12 ... keyword2: token21 & token22 token23 keyword3: (token31 token32) & token33
into the following list:
[
["keyword1:", "token11", "token12", ...],
["keyword2:", ["token21", "&", "token22"], "token23"],
["keyword3:", [["token31", "token32"], "&", "token33"]],
]
Upvotes: 1
Views: 64
Reputation: 63709
To add support for the '&' operator as in your original post, you were very close with your use of infixNotation
. In your original, you had an expression like "a b & c", which you wanted to parse as ['a', ['b', '&', 'c']
. The first issue you had was with the token vs. key issue, which you have resolved for yourself. The second issue has to do with your operators. It is possible in infixNotation
to define an empty operator using a Python None for the operator expression. Since you have defined your expression to make this of lower precedence than '&', then you would define your expression as:
expr = infixNotation(token,
[
('&', 2, opAssoc.LEFT,),
(None, 2, opAssoc.LEFT,),
])
Use runTests to quickly run a bunch of tests:
expr.runTests("""\
a b c
a & b
a & b & c
a & b c
a & (b c)
""", fullDump=False)
Prints:
a b c
[['a', 'b', 'c']]
a & b
[['a', '&', 'b']]
a & b & c
[['a', '&', 'b', '&', 'c']]
a & b c
[[['a', '&', 'b'], 'c']]
a & (b c)
[['a', '&', ['b', 'c']]]
Upvotes: 1
Reputation: 2597
OK, so I was looking for a way to specify that token
ends with an alphanumeric character, that is not :
. It turns out Pyparsing has a function WordEnd()
, which I used, with which the expression is correctly parsed.
import pyparsing
from pprint import pprint
token = pp.Combine(pp.Word(pp.alphas) + pp.WordEnd())
keyword = pp.Combine(pp.Word(pp.alphas) + pp.Literal(":"))
expr = pp.Group(keyword[1] + token[1, ...])[1, ...]
pprint(expr.parse_string("keyA: aaa bb ccc keyB: ddd eee keyC: xxx yyy hhh zzz").as_list())
[['keyA:', 'aaa', 'bb', 'ccc'],
['keyB:', 'ddd', 'eee'],
['keyC:', 'xxx', 'yyy', 'hhh', 'zzz']]
Upvotes: 1