Reputation: 1607
I am trying to use pyparsing to match nested expressions. Without having to specify the content expression, is there a way to use regular expressions to define the opener?
My opener consists of two tokens A and B. These two tokens may or may not be separated by whitespace and newline characters.
I am able to create a pyparsing expression for the opener when specifying a content rule. However is there a way to do this without specifying a content rule? Alternatively, how can I specify a rule to ignore content?
opener = Word('A') + ZeroOrMore(' ') + ZeroOrMore('\n') + Word('B')
closer = 'END'
content_rule = SkipTo(opener | closer)
pat = nestedExpr(opener=opener, closer=closer, content=content_rule)
for x in pat.scanString(data):
print x
Context: I am trying to extract if-blocks from source code files. So I will need a way of extracting nested expressions. This requires me to specify:
Upvotes: 3
Views: 369
Reputation: 57188
You said:
I am not sure if this is possible however.
It isn't, at least for general C code. For example:
if (a) {
char a = '}';
}
There's no good way for your parser to know (as opposed to guessing) that the first close curly-brace is not intended to close the if statement without actually parsing the interior. (Also comments, double-quoted strings, etc. Not to mention curly-brace-less ifs!)
If you're confident the interior only has balanced curly braces, and you only want top-level if statements, my suggestion would be to do something like this (untested, but hopefully it gets the idea across):
pat = Literal('if') + nestedExpr("{", "}")
If you need nested ifs, you might be able to do something like:
expression = Forward()
if_statement = Literal('if') + nestedExpr("{", "}", expression)
expression << ZeroOrMore(Or(if_statement, Regex('.')))
Upvotes: 2