Jason S
Jason S

Reputation: 189626

special-case lexer rule in ply

Is there a way to special-case a ply lexer rule?

t_IDENT     = r'[a-zA-Z_][0-9a-zA-Z_]*'
t_OPERATOR  = r'[<>=/*+-]+'
t_DEFINE    = r'='
t_PRODUCES  = r'=>'

I want to define an operator as any combination of the listed characters, except that = and => have their own special cases. For example:

a + b
# IDENT('a') OPERATOR('+') IDENT('b') 

a ++=--> b
# IDENT('a') OPERATOR('++=-->') IDENT('b') 

a == b
# IDENT('a') OPERATOR('==-->') IDENT('b') 

a => b
# IDENT('a') PRODUCES('=>') IDENT('b') 

a = b
# IDENT('a') DEFINE('=') IDENT('b') 

a >= b
# IDENT('a') OPERATOR('>=') IDENT('b') 

a <=> b
# IDENT('a') OPERATOR('<=>') IDENT('b') 

Upvotes: 1

Views: 315

Answers (2)

Jason S
Jason S

Reputation: 189626

I removed the automated t_DEFINE and t_PRODUCES rules and used the reserved word technique to handle the special cases:

special_operators = {'=': 'DEFINE',
                     '=>': 'PRODUCES'}

def t_OPERATOR(t):
    r'[<>=/*+-]+'
    t.type = special_operators.get(t.value, t.type)
    return t

Upvotes: 0

Eldar Abusalimov
Eldar Abusalimov

Reputation: 25483

Yes, the reason you get OPERATOR tokens instead of expected PRODUCES/DEFINE is token precedence rules of PLY lexer:

Internally, lex.py uses the re module to do its patten matching. When building the master regular expression, rules are added in the following order:

  1. All tokens defined by functions are added in the same order as they appear in the lexer file.
  2. Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).

Just convert certain rules into functions:

def t_DEFINE(t):
    r'='
    return t

def t_PRODUCES(t):
    r'=>'
    return t

Upvotes: 2

Related Questions