Mike Kelly
Mike Kelly

Reputation: 1029

Python regular expression not matching end of line

I'm trying to match a C/C++ function definition using a fairly complex regular expression. I've found a case where it's not working and I'm trying to understand why. Here is the input string which does not match:

   void Dump(const char * itemName, ofstream & os)

which clearly is a valid C++ method declaration. Here is the RE:

   ^[^=+-|#]*?([\w<>]+\s+(?!if|for|switch|while|catch|return)\w+)\s*\([^;=+-|]*$

This basically tries to distinguish between other C syntax which looks like a method declaration, i.e. which has words followed by paraentheses.

Using the very useful Python regular expression debugger (http://www.pythonregex.com/) I've narrowed it down to the trailing "$" - if I remove the trailing $ in the regular expression, it matches the method signature above; if I leave in the $, it doesn't. There must be some idiosyncracy of Python RE's that is eluding me here. Thanks.

Upvotes: 3

Views: 1673

Answers (2)

Alan Moore
Alan Moore

Reputation: 75222

The output of PythonRegex is somewhat misleading. The results of r.groups() and r.findall() are both the same: u'void Dump', which is the content of the first capturing group. If it showed the whole match, you'd see that when remove the $ you're only matching

void Dump(

...not the whole function definition as you intended. The reason for that (as Greg explained) is a syntax error in your last character class. You need to escape the hyphen by listing it first ([^-;=+|]) or last ([^;=+|-]), or by adding a backslash ([^;=+\-|]).

The only way I can see to get PythonRegex to show the whole match is by removing all capturing groups (or converting them to non-capturing).

Upvotes: 1

Greg Hewgill
Greg Hewgill

Reputation: 992847

The use of +-| in your character class [^;=+-|] is a range specification. This will result in the character class containing (actually excluding since you're using ^) much more than you intend. To specify a literal - in a character class, mention it first like [^-;=+|].

Upvotes: 4

Related Questions