Why we count a string as a single token in lexical analysis of compiler design?

I am learning about compiler design. The task of lexical analyser in compiler is to convert the code into stream of tokes. But I am confused why we consider a string as a single token . For example - printf("%d is integer", x); In this statement printf, (, "%d is integer", ,, x, ), ; are the tokens but why %d in string is not considered a separate token?

Upvotes: 1

Answers (1)

John Bode

Reputation: 123558

Because format specifiers like %d (or any other string contents) are not syntactically meaningful - there's no element of the language grammar that depends on them. String contents (including format specifiers like %d) are data, not code, and thus not meaningful to the compiler. The character sequence %d is only meaningful at runtime, and only to the *printf/*scanf families of functions, and only as part of a format string.

To recognize %d as a distinct token, you would have to tokenize the entire string - ", %d, is, integer, ". That opens up a whole can of worms on its own, making parsing of strings more difficult.

Some compilers do examine the format string arguments to printf and scanf calls to do some basic sanity checking, but that's well after tokenization has already taken place. At the tokenization stage, you don't know that this is a call to the printf library function. It's not until after syntax analysis that the compiler knows that this is a specific library call and can perform that kind of check.

Upvotes: 2

Why we count a string as a single token in lexical analysis of compiler design?

Answers (1)

Related Questions