Aniket
Aniket

Reputation: 31

Using regular definitions in PLY

When I use the following code snippet:-

t_ASD = r'(a|aa*)'

On input aaaaaaaa The output comes out to be:-

LexToken(ID,'aaaaaaaa',1,0)

Which is expected. But when the same input is run on this code:-

ASD = r'(a|aa*)'
@TOKEN(ASD)
def t_ASD(t):
    return t

The output comes out to be

LexToken(ASD,'a',1,0)
LexToken(ASD,'a',1,1)
LexToken(ASD,'a',1,2)
LexToken(ASD,'a',1,3)
LexToken(ASD,'a',1,4)
LexToken(ASD,'a',1,5)
LexToken(ASD,'a',1,6)
LexToken(ASD,'a',1,7)

What can be the possible reason for this mismatch in output? And how to modify the second code to obtain the output:- LexToken(ID,'aaaaaaaa',1,0)

Upvotes: 0

Views: 136

Answers (1)

rici
rici

Reputation: 241681

It's evident from the output from your first example that the token is being matched by the ID rule, not the ASD rule. Remember that patterns supplied as functions have priority over patterns supplied as variables. (See the Ply manual.)

Here's my almost minimal test case, without interaction with other rules, which shows that using a pattern variable has the expected result:

import ply.lex as lex
tokens = ['A']
ignore = ' \t\n'
def t_error(t):
    print("Bad char: '%s'" % t.value)
    t.lexer.skip()

t_A = r'(a|aa*)'

lexer = lex.lex()
lexer.input('aaaaaaa')
for token in lexer: print(token)

Output (same output with python2):

$ python3 lexorder.py 
LexToken(A,'a',1,0)
LexToken(A,'a',1,1)
LexToken(A,'a',1,2)
LexToken(A,'a',1,3)
LexToken(A,'a',1,4)
LexToken(A,'a',1,5)
LexToken(A,'a',1,6)

It's the expected result because of the way Python regular expressions work. The Python regex engine does not implement longest-match semantics; it prefers earlier alternatives even if their match is shorter.

Upvotes: 2

Related Questions