john
john

Reputation: 251

Small problem with reg exps in python

So I have one variable that has all the code from some file. I need to remove all comments from this file. One of my regexp lines is this

x=re.sub('\/\*.*\*\/','',x,re.M,re.S);

What I want this to be doing is to remove all multi line comments. For some odd reason though, its skipping two instances of */, and removing everything up to the third instance of */.

I'm pretty sure the reason is this third instance of */ has code after it, while the first two are by themselves on the line. I'm not sure why this matters, but I'm pretty sure thats why.

Any ideas?

Upvotes: 2

Views: 88

Answers (3)

verisimilidude
verisimilidude

Reputation: 736

The regular expression is "greedy" and when presented with several stopping points will take the farthest one. Regex has some patterns to help control this, in particular the

(?&gt!...)

which matches the following expression only if it is Not preceeded by a match of the pattern in parens. (put in a pointy brace for &gt in the above - I don't know the forum convention for getting on in my answer).

(?*...) was not in Python 2.4 but is a good choice if you are using a later version.

Upvotes: 1

user206545
user206545

Reputation:

The expression .* is greedy, meaning that it will attempt to match as many characters as possible. Instead, use (.*?) which will stop matching characters as soon as possible.

Upvotes: 1

Mario
Mario

Reputation: 36487

.* will always match as many characters as possible. Try (.*?) - most implementations should try to match as few characters as possible then (should work without the brackets but not sure right now). So your whole pattern should look like this: \/\*.*?\*\/ or \/\*(.*?)\*\/

Upvotes: 4

Related Questions