Reputation: 189626
Is there a way to special-case a ply lexer rule?
t_IDENT = r'[a-zA-Z_][0-9a-zA-Z_]*'
t_OPERATOR = r'[<>=/*+-]+'
t_DEFINE = r'='
t_PRODUCES = r'=>'
I want to define an operator as any combination of the listed characters, except that =
and =>
have their own special cases. For example:
a + b
# IDENT('a') OPERATOR('+') IDENT('b')
a ++=--> b
# IDENT('a') OPERATOR('++=-->') IDENT('b')
a == b
# IDENT('a') OPERATOR('==-->') IDENT('b')
a => b
# IDENT('a') PRODUCES('=>') IDENT('b')
a = b
# IDENT('a') DEFINE('=') IDENT('b')
a >= b
# IDENT('a') OPERATOR('>=') IDENT('b')
a <=> b
# IDENT('a') OPERATOR('<=>') IDENT('b')
Upvotes: 1
Views: 315
Reputation: 189626
I removed the automated t_DEFINE
and t_PRODUCES
rules and used the reserved word technique to handle the special cases:
special_operators = {'=': 'DEFINE',
'=>': 'PRODUCES'}
def t_OPERATOR(t):
r'[<>=/*+-]+'
t.type = special_operators.get(t.value, t.type)
return t
Upvotes: 0
Reputation: 25483
Yes, the reason you get OPERATOR
tokens instead of expected PRODUCES
/DEFINE
is token precedence rules of PLY lexer:
Internally, lex.py uses the
re
module to do its patten matching. When building the master regular expression, rules are added in the following order:
- All tokens defined by functions are added in the same order as they appear in the lexer file.
- Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).
Just convert certain rules into functions:
def t_DEFINE(t):
r'='
return t
def t_PRODUCES(t):
r'=>'
return t
Upvotes: 2