0xFF
0xFF

Reputation: 1

How to define a string constant token in PLY?

Currently building a lexical analyzer using PLY for python. I'm trying make it so that anything in-between quotes " " will be recognized as a string constant however as you can see is not the case.

my current stringconst definition

def t_STRINGCONSTANT(t):
  r'\"([^\\\n]|(\\.))*?\"'

Input string:

'println( “factori alof 10 is” , fact (10), “rom the recursive function” );'

output:

LexToken(PRINTLN,'println',1,0)
LexToken(LEFTPAREN,'(',1,7)
Illegal character '“'
LexToken(ID,'factorialof10is',1,10)
Illegal character '”'
LexToken(COMMA,',',1,27)
LexToken(ID,'fact',1,29)
LexToken(LEFTPAREN,'(',1,34)
LexToken(INTCONSTANT,10,1,35)
LexToken(RIGHTPAREN,')',1,37)
LexToken(COMMA,',',1,38)
Illegal character '“'
LexToken(ID,'rom',1,41)
LexToken(ID,'the',1,45)
LexToken(ID,'recursive',1,49)
LexToken(ID,'function',1,59)
Illegal character '”'
LexToken(RIGHTPAREN,')',1,69)
LexToken(SEMICOLON,';',1,70)

seems like no matter what combination of regex I use I always get that output.

Upvotes: 0

Views: 587

Answers (1)

rici
rici

Reputation: 241721

Look very closely at the quote marks shown in the Ply error message:

Illegal character '“'

Contrast with your regex:

r'\"

Making them bigger can help:

“ "

When preparing your test input, avoid using editors which automatically change quote marks into the fancier "typographic quotes".

Upvotes: 1

Related Questions