Reputation: 251
So I have one variable that has all the code from some file. I need to remove all comments from this file. One of my regexp lines is this
x=re.sub('\/\*.*\*\/','',x,re.M,re.S);
What I want this to be doing is to remove all multi line comments. For some odd reason though, its skipping two instances of */, and removing everything up to the third instance of */.
I'm pretty sure the reason is this third instance of */ has code after it, while the first two are by themselves on the line. I'm not sure why this matters, but I'm pretty sure thats why.
Any ideas?
Upvotes: 2
Views: 88
Reputation: 736
The regular expression is "greedy" and when presented with several stopping points will take the farthest one. Regex has some patterns to help control this, in particular the
(?>!...)
which matches the following expression only if it is Not preceeded by a match of the pattern in parens. (put in a pointy brace for > in the above - I don't know the forum convention for getting on in my answer).
(?*...) was not in Python 2.4 but is a good choice if you are using a later version.
Upvotes: 1
Reputation:
The expression .*
is greedy, meaning that it will attempt to match as many characters as possible. Instead, use (.*?)
which will stop matching characters as soon as possible.
Upvotes: 1
Reputation: 36487
.*
will always match as many characters as possible. Try (.*?)
- most implementations should try to match as few characters as possible then (should work without the brackets but not sure right now). So your whole pattern should look like this: \/\*.*?\*\/
or \/\*(.*?)\*\/
Upvotes: 4