user1471
user1471

Reputation: 469

Regex character match counter

I am writing a python script which requires to strip all the methods having a particular syntax from a source file.

Suppose I have some methods in a source file which goes like.

fn difflml(args)[
       if [
            --blah 
           ]
       [ var ]
] -- END OF THE METHOD

--Othed method starts and stuffs

Can I strip these style methods from the source file using regex.

I dont know how to keep a count on [ and ] so as to strip the whole method . What I was thinking was to keep a count on [ and ] ,incrementing on [ and decrementing on ] and print when the count is 0 .

As I am fairly new to regex , I am not sure if this can be done in the regex itself .

Upvotes: 1

Views: 239

Answers (3)

Jon Clements
Jon Clements

Reputation: 142176

Here's a quick example using pyparsing that strips comments such as "-- END OF METHOD"

from pyparsing import *

parser = nestedExpr('[', ']').setParseAction(keepOriginalText) + Group('--' + restOfLine).suppress()
print parser.transformString(text)

Produces with your example code:

fn difflml(args)[
       if [
            --blah 
           ]
       [ var ]
]

--Othed method starts and stuffs

Upvotes: 1

Hans Then
Hans Then

Reputation: 11322

This is impossible to do properly with (only) a regex. Since the [ and ] characters can be recursively nested, regular expressions cannot be used, since a regular expression does not have a stack to keep track of matching brackets. A good rule of the thumb is, if you have recursive patterns (patterns that can be nested inside themselves), you cannot use regular expressions.

The proper method would be to use a tokenizer using regular expressions and then create a recursive descent parser. Depending on your skill in writing parser code, this will set you back a few days of coding.

The improper but crudely effective way would be to recognise that the beginning of a function and the end of a function will both start at the same indentation level. You can create a special regex that does not match the recursive pattern, but simply matches anything between the start of your function definition and a closing brace that starts at the beginning of a line. This will probably take you an hour or two to write and debug.

Upvotes: 2

Krzysztof Jabłoński
Krzysztof Jabłoński

Reputation: 1941

I guess it is possible to do with regex, however without brackets counting (this is what regex engine cannot do). Reluctant quantifier may be used instead to match the first occurrence of method ending bracket (assuming it is always first/only char on the line, or -- END ... comment is always present).

In my opinion however regex is not appropriate tool for such a purpose, 'cause it can be very memory and time ineffective over long multi line and multi branch code.

Consider writing a simple parser instead.

Upvotes: 1

Related Questions