JB2
JB2

Reputation: 1607

Using regular expressions to denote the opener of nestedExpr

I am trying to use pyparsing to match nested expressions. Without having to specify the content expression, is there a way to use regular expressions to define the opener?

My opener consists of two tokens A and B. These two tokens may or may not be separated by whitespace and newline characters.

I am able to create a pyparsing expression for the opener when specifying a content rule. However is there a way to do this without specifying a content rule? Alternatively, how can I specify a rule to ignore content?

opener = Word('A') + ZeroOrMore(' ') + ZeroOrMore('\n') + Word('B')
closer = 'END'
content_rule = SkipTo(opener | closer)


pat = nestedExpr(opener=opener, closer=closer, content=content_rule) 


for x in pat.scanString(data):
    print x

Context: I am trying to extract if-blocks from source code files. So I will need a way of extracting nested expressions. This requires me to specify:

  1. An opener that consists of multiple tokens which can be separated by white-space ('if {')
  2. A way for the closer to match only the closing tags that correspond to the opener. i.e. Closers for other blocks are the same than for the if-block. Consider loops for example: while () {}. I am not sure if this is possible however.

Upvotes: 3

Views: 369

Answers (1)

Jesse Rusak
Jesse Rusak

Reputation: 57188

You said:

I am not sure if this is possible however.

It isn't, at least for general C code. For example:

if (a) {
    char a = '}';
}

There's no good way for your parser to know (as opposed to guessing) that the first close curly-brace is not intended to close the if statement without actually parsing the interior. (Also comments, double-quoted strings, etc. Not to mention curly-brace-less ifs!)

If you're confident the interior only has balanced curly braces, and you only want top-level if statements, my suggestion would be to do something like this (untested, but hopefully it gets the idea across):

pat = Literal('if') + nestedExpr("{", "}")

If you need nested ifs, you might be able to do something like:

expression = Forward()
if_statement = Literal('if') + nestedExpr("{", "}", expression)
expression << ZeroOrMore(Or(if_statement, Regex('.')))

Upvotes: 2

Related Questions