mrks
mrks

Reputation: 1441

Regular expression: if, else if, else

I am trying to parse FSM statements of the Gezel language (http://rijndael.ece.vt.edu/gezel2/) using Python and regular expressions

regex_cond = re.compile(r'.+((else\tif|else|if)).+')  
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2);

I have problems to distinguish if and else if. The else if in the example is recognized as a if.

Upvotes: 0

Views: 1501

Answers (4)

Katriel
Katriel

Reputation: 123652

Don't do this; use pyparsing instead. You'll thank yourself later.


The problem is that .+ is greedy, so it's eating up the else... do .+? instead. Or rather, don't, because you're using pyparsing now.

regex_cond = re.compile( r'.+?(else\sif|else|if).+?' )
...
# else if

Upvotes: 2

unutbu
unutbu

Reputation: 879621

Your immediate problem is that .+ is greedy and so it matches @s0 else instead of just @s0. To make it non-greedy, use .+? instead:

import re

regex_cond = re.compile(r'.+?(else\s+if|else|if).+')  
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2)
print(match.groups())
# ('else if',)

However, like others have suggested, using a parser like Pyparsing is a better method than using re here.

Upvotes: 1

InsertNickHere
InsertNickHere

Reputation: 3666

Correct me if im wrong, but RE are not good for parsing, since its only sufficient for Type2 languages. For exaple you can't decide weather or not ((())())) is a valid statement without "counting", which regex can't do. Or, to talk about your example, if else else could not be found as invalid. Maybe im mixiung up scanner/parser, in this case please tell me.

Upvotes: 0

Alex B
Alex B

Reputation: 1448

a \t matches a tab character. It doesn't look like you have a tab character between "else" and "if" in line2. You might try \s instead, which matches any whitespace character.

Upvotes: 3

Related Questions