Reputation: 1441
I am trying to parse FSM statements of the Gezel language (http://rijndael.ece.vt.edu/gezel2/) using Python and regular expressions
regex_cond = re.compile(r'.+((else\tif|else|if)).+')
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2);
I have problems to distinguish if and else if. The else if in the example is recognized as a if.
Upvotes: 0
Views: 1501
Reputation: 123652
Don't do this; use pyparsing
instead. You'll thank yourself later.
The problem is that .+
is greedy, so it's eating up the else
... do .+?
instead. Or rather, don't, because you're using pyparsing
now.
regex_cond = re.compile( r'.+?(else\sif|else|if).+?' )
...
# else if
Upvotes: 2
Reputation: 879621
Your immediate problem is that .+
is greedy and so it matches @s0 else
instead of just @s0
. To make it non-greedy, use .+?
instead:
import re
regex_cond = re.compile(r'.+?(else\s+if|else|if).+')
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2)
print(match.groups())
# ('else if',)
However, like others have suggested, using a parser like Pyparsing is a better method than using re
here.
Upvotes: 1
Reputation: 3666
Correct me if im wrong, but RE are not good for parsing, since its only sufficient for Type2 languages. For exaple you can't decide weather or not ((())())) is a valid statement without "counting", which regex can't do. Or, to talk about your example, if else else could not be found as invalid. Maybe im mixiung up scanner/parser, in this case please tell me.
Upvotes: 0
Reputation: 1448
a \t matches a tab character. It doesn't look like you have a tab character between "else" and "if" in line2. You might try \s instead, which matches any whitespace character.
Upvotes: 3