Alexey Frunze
Alexey Frunze

Reputation: 62048

#line and string literal concatenation

Given this piece of C code:

char s[] =

"start"

#ifdef BLAH
"mid"
#endif

"end";

what should the output of the preprocessor be? In other words, what should the actual compiler receive and be able to handle? To narrow the possibilities, let's stick to C99.

I'm seeing that some preprocessors output this:

#line 1 "tst00.c"
char s[] =

"start"
#line 9
"end";

or this:

# 1 "tst00.c"
char s[] =

"start"




# 7 "tst00.c"


"end";

gcc -E outputs this:

# 1 "tst00.c"
# 1 "<command-line>"
# 1 "tst00.c"
char s[] =

"start"





"end";

And gcc is perfectly fine compiling all of the above preprocessed code even with the -fpreprocessed option, meaning that no further preprocessing should be done as all of it has been done already.

The confusion stems from this wording of the 1999 C standard:

5.1.1.2 Translation phases
1 The precedence among the syntax rules of translation is specified by the following
  phases.
...
4. Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. ... All preprocessing directives are
then deleted.
...
6. Adjacent string literal tokens are concatenated.
7. White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are syntactically
and semantically analyzed and translated as a translation unit.

In other words, is it legal for the #line directive to appear between adjacent string literals? If it is, it means that the actual compiler must do another round of string literal concatenation, but that's not mentioned in the standard.

Or are we simply dealing with non-standard compiler implementations, gcc included?

Upvotes: 0

Views: 70

Answers (1)

Potatoswatter
Potatoswatter

Reputation: 137800

The #line or # 1 lines you get from GCC -E (or a compatible tool) are added for the sake of human readers and any tools that might attempt to work with a text form of the output of the preprocessor. They are just for convenience.

In general, yes, directives may appear between concatenated string literal tokens. #line is no different from #ifdef in your example.

Or are we simply dealing with non-standard compiler implementations, gcc included?

-E and -fpreprocessed modes are not standardized. A standard preprocessor always feeds its output into a compiler, not a text file. Moreover:

The output of the preprocessor has no standard textual representation.

The reason for inserting #line directives is so that any __LINE__ and __FILE__ macros that you might insert into the already-preprocessed file, before preprocessing it again, will expand correctly. Perhaps, when compiling such a file, the compiler may notice and use the values when reporting errors. Usage of "preprocessed text files" is nonstandard and generally discouraged.

Upvotes: 1

Related Questions