Reputation: 91
I am studying lexer at Programming Languages course by Westley Weimer .
The notes are here https://www.udacity.com/wiki/cs262/unit-2#quiz-rule-order
{Video, if you care to watch, last 40 seconds.} https://www.udacity.com/course/viewer#!/c-cs262/l-48713810/e-48652568/m-48676965
Quiz: When two token definitions can match the same string, the behavior of our lexical analyzer may be ambiguous.....
Suppose we have the input string
hello, "world,"
and we want the input string to yield WORD STRING . Which rule must come last? i.e. "......what I'd like you to do is tell me which one of these functions, which one of these rules, would have to come last, bearing in mind that the one that comes first wins all ties in order for hello, "world" to break down into a word followed by a string."
def t_WORD(token):
r'[^ <>]+'
def t_STRING(token)
r'"[^"]*"'
........The answer is:
t_STRING , then t_WORD
..........I don't get it even after watching the video multiple times, why does STRING have priority over WORD?
Please enlighten.
Thank you very much.
Upvotes: 0
Views: 760
Reputation: 241881
Both patterns match "world"
, but the desire is that the token t_STRING
be returned. To do that, t_STRING
needs to have priority, so it must be placed first, because if there are two or more patterns with the same longest match, the earliest pattern wins.
Upvotes: 1