Reputation: 61
I am using pyparsing to parse a nested expression which is formed by delimited lists but which includes some basic arithmetic (just multiplication, for instance). A sample expression could look like this:
(A, B, 2 * C, 3 * ( D, E, 2 * F, 3 *(G, H)), I )
The output should unfold the arithmetic:
( A, B, C, C, D, E, F, F, G, H, G, H, G, H, D, E, F, F, G, H, G, H, G, H, D, E, F, F, G, H, G, H, G, H, I )
Could somebody give me hint how to approach the problem?
I started like follows: since there's just the operation multiplication, I decided to use the '*' character as a delimiter in a somewhat weird list:
import pyparsing as pp
oddDelim = pp.Or([',', '*'])
weirdList = pp.Optional(',').suppress() + \
pp.delimitedList(pp.Or([pp.alphas, pp.pyparsing_common.number]), delim = oddDelim, combine = False) + \
pp.Optional('*').suppress()
nestedTest = pp.nestedExpr(content = weirdList)
Using this nestedTest expression I get a reasonable result:
[['A', 'B', 2, 'C', 3, ['D', 'E', 2, 'F', 3, ['G', 'H']], 'I']]
but I don't know how should I parse the tokens in order to properly unfold the arithmetics.
Instead of iterating over the tokens sequentially in a FOR loop, I would ideally like to start unfolding the arithmetic from the highest degree of nesting and progressively going down. But I don't know how...
Is nestedExpr the way to go? Or should I change the approach and use Forward or maybe infixNotation? I am very new into pyparsing I would be very grateful if I got some hints/ideas on this.
Thanks very much in advance for your help!
Cheers, Pau
Upvotes: 2
Views: 462
Reputation: 63762
If you want to use Forward() to roll our own recursive grammar, it is best to start with writing a BNF for your grammar. This will help you think straight about the problem space first, and then worry about the coding later.
Here is a rough BNF for what you've posted:
list_expr ::= '(' list_item [',' list_item]* ')'
list_item ::= term | mult_term | list_expr
mult_term ::= integer '*' list_item
term ::= A-Z
That is, each list enclosed in parentheses has a comma-delimited list of items, and each item can be a single character term, a multiplication expression of an integer, a '*' and an item, or a nested list in another set of parentheses.
To translate this to pyparsing, work bottom-up to define each expression. For instance, define a term using the new Char class (which is a single-character from a string of allowed characters):
term = pp.Char("ABCDEFGHI... etc.")
You'll need to use Forward for list_item, since it will need expressions that aren't defined yet, so Forward() gives you a placeholder. Then when you have term, mult_term, and list_expr defined, using '<<=' to "insert" the definition into the existing placeholder, like this:
list_item <<= term | mult_term | list_expr
Since you asked about infixNotation, I'll talk about that approach also. When using infixNotation, look at your input and identify what constitutes grouping, operators, and operands.
The grouping here is easy, it is done using ()'s, which is pretty standard, and infixNotation will treat them as such by default.
Next identify what the lowest-level operands are. You have two types of operands: integers and single alpha characters.
The two operators are '*' for multiplication, and ',' for addition.
Since you only asked for suggestions, I'll stop there and let you tackle/struggle with the next steps on your own.
Upvotes: 1