Reputation: 1
Currently building a lexical analyzer using PLY for python. I'm trying make it so that anything in-between quotes " " will be recognized as a string constant however as you can see is not the case.
my current stringconst definition
def t_STRINGCONSTANT(t):
r'\"([^\\\n]|(\\.))*?\"'
Input string:
'println( “factori alof 10 is” , fact (10), “rom the recursive function” );'
output:
LexToken(PRINTLN,'println',1,0)
LexToken(LEFTPAREN,'(',1,7)
Illegal character '“'
LexToken(ID,'factorialof10is',1,10)
Illegal character '”'
LexToken(COMMA,',',1,27)
LexToken(ID,'fact',1,29)
LexToken(LEFTPAREN,'(',1,34)
LexToken(INTCONSTANT,10,1,35)
LexToken(RIGHTPAREN,')',1,37)
LexToken(COMMA,',',1,38)
Illegal character '“'
LexToken(ID,'rom',1,41)
LexToken(ID,'the',1,45)
LexToken(ID,'recursive',1,49)
LexToken(ID,'function',1,59)
Illegal character '”'
LexToken(RIGHTPAREN,')',1,69)
LexToken(SEMICOLON,';',1,70)
seems like no matter what combination of regex I use I always get that output.
Upvotes: 0
Views: 587
Reputation: 241721
Look very closely at the quote marks shown in the Ply error message:
Illegal character '“'
Contrast with your regex:
r'\"
Making them bigger can help:
When preparing your test input, avoid using editors which automatically change quote marks into the fancier "typographic quotes".
Upvotes: 1