Reputation: 31
When I use the following code snippet:-
t_ASD = r'(a|aa*)'
On input aaaaaaaa
The output comes out to be:-
LexToken(ID,'aaaaaaaa',1,0)
Which is expected. But when the same input is run on this code:-
ASD = r'(a|aa*)'
@TOKEN(ASD)
def t_ASD(t):
return t
The output comes out to be
LexToken(ASD,'a',1,0)
LexToken(ASD,'a',1,1)
LexToken(ASD,'a',1,2)
LexToken(ASD,'a',1,3)
LexToken(ASD,'a',1,4)
LexToken(ASD,'a',1,5)
LexToken(ASD,'a',1,6)
LexToken(ASD,'a',1,7)
What can be the possible reason for this mismatch in output? And how to modify the second code to obtain the output:- LexToken(ID,'aaaaaaaa',1,0)
Upvotes: 0
Views: 136
Reputation: 241681
It's evident from the output from your first example that the token is being matched by the ID
rule, not the ASD
rule. Remember that patterns supplied as functions have priority over patterns supplied as variables. (See the Ply manual.)
Here's my almost minimal test case, without interaction with other rules, which shows that using a pattern variable has the expected result:
import ply.lex as lex
tokens = ['A']
ignore = ' \t\n'
def t_error(t):
print("Bad char: '%s'" % t.value)
t.lexer.skip()
t_A = r'(a|aa*)'
lexer = lex.lex()
lexer.input('aaaaaaa')
for token in lexer: print(token)
Output (same output with python2):
$ python3 lexorder.py
LexToken(A,'a',1,0)
LexToken(A,'a',1,1)
LexToken(A,'a',1,2)
LexToken(A,'a',1,3)
LexToken(A,'a',1,4)
LexToken(A,'a',1,5)
LexToken(A,'a',1,6)
It's the expected result because of the way Python regular expressions work. The Python regex engine does not implement longest-match semantics; it prefers earlier alternatives even if their match is shorter.
Upvotes: 2