Reputation: 622
I'm trying to write a regexp that catches assignments equal signs within conditional statements in C language that I already extracted (using python module re
).
My attempt:
exp = re.compile(r'\(\s*[0-9A-Za-z_]+\s*[^!<>=]=[^=]')
While working for a number of cases, it fails to match a simple case like the following string:
'(c=getc(pp)) == EOF'
Can someone please explain why my regexp is not a match for this string, and how could I make it better ? I'm aware that some weird cases might still elude me, but I can treat those manually, the purpose is to do the bulk of the legwork automatically.
Upvotes: 2
Views: 317
Reputation: 55
The reason why this does not work is [^!<>=]=
, which makes your code look for a character which is not =
followed by a character which is =
. I can see your intention in doing so, but it's the wrong way.
For the simple, case have a look at the following expression:
[0-9A-Za-z_]+\s*=\s*[0-9A-Za-z_]+(\(\s*[0-9A-Za-z_]*\s*\))?
This matches the c=getc(pp)
part of your source, because it looks for a =
which is either followed (or preceded) by optional whitespaces and characters or numbers. Already this prevents the regex from matching ==
, <=
, !=
, or >=
.
Aside of that it also looks if the right hand side is a function or simply a variable or just a number (optional match through ?
for the bracket-part at the end of the expression). Note also the *
for the matching part within the braces ([0-9A-Za-z_]*
), which enables you to match function calls without parameters.
Works for:
(c=getc(p)) == EOF
(c =getc()) == EOF
(c=getc( )) == EOF
(c = getc( p )) == EOF
(c = i) == EOF
(c=10) == EOF
This will not work for constructs, such as x = y(z())
(and surely many more).
Aside of this being said, I recommend the following link (not exactly your question, but really nice insights): Regular expression to recognize variable declarations in C
Upvotes: 1
Reputation: 140186
[^!<>=]
following your identifier prevents =
to be matched after c
.
If your intention is to match assignments, try to match only one equal sign after the identifier, like this:
exp = re.compile(r'\(\s*[0-9A-Za-z_]+\s*=[^=]')
print(exp.search('(c=getc(pp)) == EOF'))
which results in:
<_sre.SRE_Match object; span=(0, 4), match='(c=g'>
Upvotes: 1