goughgough
goughgough

Reputation: 91

Python lexer lexical analysis token priority rule order dealing with ambiguities --- why STRING has priority over WORD?

I am studying lexer at Programming Languages course by Westley Weimer .

The notes are here https://www.udacity.com/wiki/cs262/unit-2#quiz-rule-order

{Video, if you care to watch, last 40 seconds.} https://www.udacity.com/course/viewer#!/c-cs262/l-48713810/e-48652568/m-48676965

Quiz: When two token definitions can match the same string, the behavior of our lexical analyzer may be ambiguous.....

Suppose we have the input string

hello, "world,"

and we want the input string to yield WORD STRING . Which rule must come last? i.e. "......what I'd like you to do is tell me which one of these functions, which one of these rules, would have to come last, bearing in mind that the one that comes first wins all ties in order for hello, "world" to break down into a word followed by a string."

def t_WORD(token):
    r'[^ <>]+'


def t_STRING(token)                             
    r'"[^"]*"'

........The answer is:

t_STRING , then t_WORD

..........I don't get it even after watching the video multiple times, why does STRING have priority over WORD?

Please enlighten.

Thank you very much.

Upvotes: 0

Views: 760

Answers (1)

rici
rici

Reputation: 241881

Both patterns match "world", but the desire is that the token t_STRING be returned. To do that, t_STRING needs to have priority, so it must be placed first, because if there are two or more patterns with the same longest match, the earliest pattern wins.

Upvotes: 1

Related Questions