Reputation: 11209
I have the following code that parses arithmetic expressions using parsimonious. It works OK but whitespaces are included in the parse tree. How can we get rid of whitespaces in the parse tree and only keep the meaningful tokens? Lark parsing library achieves that via %ignore WS
. Is there something similar in parsimonious or another way to achieve the same effect?
from parsimonious.grammar import Grammar
g = '''
sum = (number plus sum) / (number plus prod)
prod = (number times prod) / (left_par number plus prod right_par) / number
number = (ws ~"[\d]+" ws) / (left_par sum right_par)
plus = ws "+" ws
times = ws "*" ws
left_par = ws "(" ws
right_par = ws ")" ws
ws = ~"[\s]*"
'''
grammar = Grammar(g)
print(grammar.parse(' (134 +77 + 56) + 10 * 30' ))
This is the output:
<Node called "bold_text" matching "((bold stuff))">
<Node called "bold_open" matching "((">
<RegexNode called "text" matching "bold stuff">
<Node called "bold_close" matching "))">
<Node called "sum" matching " (134 +77 + 56) + 10 * 30">
<Node matching " (134 +77 + 56) + 10 * 30">
<Node called "number" matching " (134 +77 + 56) ">
<Node matching " (134 +77 + 56) ">
<Node called "left_par" matching " (">
<RegexNode called "ws" matching " ">
<Node matching "(">
<RegexNode called "ws" matching "">
<Node called "sum" matching "134 +77 + 56">
<Node matching "134 +77 + 56">
<Node called "number" matching "134 ">
<Node matching "134 ">
<RegexNode called "ws" matching "">
<RegexNode matching "134">
<RegexNode called "ws" matching " ">
<Node called "plus" matching "+">
<RegexNode called "ws" matching "">
<Node matching "+">
<RegexNode called "ws" matching "">
<Node called "sum" matching "77 + 56">
<Node matching "77 + 56">
<Node called "number" matching "77 ">
<Node matching "77 ">
<RegexNode called "ws" matching "">
<RegexNode matching "77">
<RegexNode called "ws" matching " ">
<Node called "plus" matching "+ ">
<RegexNode called "ws" matching "">
<Node matching "+">
<RegexNode called "ws" matching " ">
<Node called "prod" matching "56">
<Node called "number" matching "56">
<Node matching "56">
<RegexNode called "ws" matching "">
<RegexNode matching "56">
<RegexNode called "ws" matching "">
<Node called "right_par" matching ") ">
<RegexNode called "ws" matching "">
<Node matching ")">
<RegexNode called "ws" matching " ">
<Node called "plus" matching "+ ">
<RegexNode called "ws" matching "">
<Node matching "+">
<RegexNode called "ws" matching " ">
<Node called "prod" matching "10 * 30">
<Node matching "10 * 30">
<Node called "number" matching "10 ">
<Node matching "10 ">
<RegexNode called "ws" matching "">
<RegexNode matching "10">
<RegexNode called "ws" matching " ">
<Node called "times" matching "* ">
<RegexNode called "ws" matching "">
<Node matching "*">
<RegexNode called "ws" matching " ">
<Node called "prod" matching "30">
<Node called "number" matching "30">
<Node matching "30">
<RegexNode called "ws" matching "">
<RegexNode matching "30">
<RegexNode called "ws" matching "">
Upvotes: 1
Views: 316