Valentin B.
Valentin B.

Reputation: 622

Regexp matching for assignment operations in C

I'm trying to write a regexp that catches assignments equal signs within conditional statements in C language that I already extracted (using python module re).

My attempt:

exp = re.compile(r'\(\s*[0-9A-Za-z_]+\s*[^!<>=]=[^=]')

While working for a number of cases, it fails to match a simple case like the following string:

'(c=getc(pp)) == EOF'

Can someone please explain why my regexp is not a match for this string, and how could I make it better ? I'm aware that some weird cases might still elude me, but I can treat those manually, the purpose is to do the bulk of the legwork automatically.

Upvotes: 2

Views: 317

Answers (2)

ohmmega
ohmmega

Reputation: 55

The reason why this does not work is [^!<>=]=, which makes your code look for a character which is not = followed by a character which is =. I can see your intention in doing so, but it's the wrong way.

For the simple, case have a look at the following expression:

[0-9A-Za-z_]+\s*=\s*[0-9A-Za-z_]+(\(\s*[0-9A-Za-z_]*\s*\))?

This matches the c=getc(pp) part of your source, because it looks for a = which is either followed (or preceded) by optional whitespaces and characters or numbers. Already this prevents the regex from matching ==, <=, !=, or >=.

Aside of that it also looks if the right hand side is a function or simply a variable or just a number (optional match through ? for the bracket-part at the end of the expression). Note also the * for the matching part within the braces ([0-9A-Za-z_]*), which enables you to match function calls without parameters.

Works for:

(c=getc(p)) == EOF
(c =getc()) == EOF
(c=getc( )) == EOF
(c = getc( p )) == EOF
(c = i) == EOF
(c=10) == EOF

This will not work for constructs, such as x = y(z()) (and surely many more).

Aside of this being said, I recommend the following link (not exactly your question, but really nice insights): Regular expression to recognize variable declarations in C

Upvotes: 1

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140186

[^!<>=] following your identifier prevents = to be matched after c.

If your intention is to match assignments, try to match only one equal sign after the identifier, like this:

exp = re.compile(r'\(\s*[0-9A-Za-z_]+\s*=[^=]')

print(exp.search('(c=getc(pp)) == EOF'))

which results in:

<_sre.SRE_Match object; span=(0, 4), match='(c=g'>

Upvotes: 1

Related Questions