Simple nested expression matching with pyparsing

Question

I wanted to match an expression which is looking like this:

( () )

I simply want to split those values along the round brackets (). Currently, I could reduce the pyparsing overhead in the s-expression examplewhich is far to extensive and not understandable (IMHO).

I got as far as to use the nestedExpr statement, which reduces it to one line:

import pyparsing as pp
parser = pp.nestedExpr(opener='(', closer=')')
print parser.parseString(example, parseAll=True).asList()

The result also appears to be split at the white spaces, which I do not want:

  skewed_output = [['',
  [''],
  '']]
expected_output = [['' 
[''], '']]
best_output = [['some value with spaces and m$1124any crazy signs' 
['more vlaues'], 'even more']]

Optionally, I'd gladly take any points to where I can read some understandable introduction as how to include a more detailed parser (I'd like to extract the value between the < > brackets and match them (see best_output), but I can always string.strip() them afterwards.

Thanks in advance!

yeputons · Accepted Answer

Pyparsing's nestedExpr accepts content and ignoreExpr arguments which specify what is a "single item" of an s-expr. You can pass QuotedString here. Unfortunately, I did not understand the difference between two parameters from docs well enough, but some experiments showed me that the following code should satisfy your requirements:

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")",
                       content=single_value,
                       ignoreExpr=None)

example = "( () )"
print(parser.parseString(example, parseAll=True))

Output:

[['some value with spaces and m$1124any crazy signs', ['more values'], 'even more']]

It expects list to start with (, end with ), and contain some optionally-whitespace-separated lists or quoted strings, each quoted string should start with <, end with > and do not contain < inside.

You can play around with content and ignoreExpr parameters more to find out that content=None, ignoreExpr=single_value makes the parse accept both quoted and unquoted strings (and separate unquoted strings with spaces):

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")", ignoreExpr=single_value, content=None)

example = "( ()  foo (foo) <(foo)>)"
print(parser.parseString(example, parseAll=True))

Output:

[['some value with spaces and m$1124any crazy signs', ['more values'], 'even m<



Some questions left open:


Why does pyparsing ignore whitespaces between consecutive list items?
What is the difference between content and ignoreExpr and when one should use each of them?

Simple nested expression matching with pyparsing

Answers (1)

Related Questions