Reputation: 33
I am learning about compiler design. The task of lexical analyser in compiler is to convert the code into stream of tokes. But I am confused why we consider a string as a single token . For example - printf("%d is integer", x);
In this statement printf
, (
, "%d is integer"
, ,
, x
, )
, ;
are the tokens but why %d
in string is not considered a separate token?
Upvotes: 1
Views: 833
Reputation: 123558
Because format specifiers like %d
(or any other string contents) are not syntactically meaningful - there's no element of the language grammar that depends on them. String contents (including format specifiers like %d
) are data, not code, and thus not meaningful to the compiler. The character sequence %d
is only meaningful at runtime, and only to the *printf
/*scanf
families of functions, and only as part of a format string.
To recognize %d
as a distinct token, you would have to tokenize the entire string - "
, %d
, is
, integer
, "
. That opens up a whole can of worms on its own, making parsing of strings more difficult.
Some compilers do examine the format string arguments to printf
and scanf
calls to do some basic sanity checking, but that's well after tokenization has already taken place. At the tokenization stage, you don't know that this is a call to the printf
library function. It's not until after syntax analysis that the compiler knows that this is a specific library call and can perform that kind of check.
Upvotes: 2