Waleen _
Waleen _

Reputation: 31

How does the C preprocessor handle not repeating includes and copying more than once

I recently learned how the c compiler actually works and there is a specific configuration that i do not understand : When I write a header it looks like this:

#ifndef HEADERNAME
#define HEADERNAME
//function declarations
#endif

As far as I understand this means that once the preprocessor went through the header file once (and pasted its contents instead of the #include that was calling for that file), it defined HEADERNAME and will skip the said header next time. Does this mean that the file is only copied once?

If so, I have an issue with the following configuration:

We have a few files: main.c A.c A.h B.c B.h D.c D.h.

-main.c includes A.h B.h and D.h,

-A and B include D.h.

If the preprocessor goes through D.h and copies it in A, will it be able to copy D.h in B since it already went through D.h? And if it copies it only once, how can the code of B.c access the declarations of the functions in D.h (since they are needed when compiling further).

I tried this configuration and saw that it does work but I don't understand how.

Upvotes: 2

Views: 138

Answers (2)

Mike Kinghan
Mike Kinghan

Reputation: 61575

A scenario such as you have in mind would be:

$ tail -n +1 *.h *.c
==> one.h <==
#ifndef ONE_H
#define ONE_H

extern int one(void);

#endif

==> three.h <==
#ifndef THREE_H
#define THREE_H

extern int three(void);

#endif

==> two.h <==
#ifndef TWO_H
#define TWO_H

extern int two(void);

#endif

==> main.c <==
#include "three.h"
#include "two.h"
#include "one.h"
#include <stdio.h>

int main(void)
{
    printf("%d %d %d\n",one(),two(),three());
}
//EOF

==> one.c <==
#include "one.h"

int one(void) { return 1; }

//EOF

==> three.c <==
#include "one.h"
#include "three.h"

int three(void) { return 2 + one(); }

//EOF

==> two.c <==
#include "one.h"
#include "two.h"

int two(void) { return 1 + one(); }

//EOF

$ gcc -o prog main.c three.c two.c one.c
./prog
1 2 3

where we've built the program with the GNU C compiler gcc. You might be using some other one and details may vary, but it doesn't matter, we'll assume gcc. We see that two.c is successfully compiled, requiring the declaration of int one(void) from one.h despite the fact that three.c was already compiled, which also got that declaration from one.h, and has therefore defined the header guard ONE_H, seemingly preventing that declaration from being consumed again (e.g. when compiling two.c)

The facts that you are not aware of or are not considering are:-

A command to compile and link multiple source files and output a program, such as:

$ gcc -o prog main.c three.c two.c one.c

is just a convenient shorthand for:

# Compile & link pipeline
$ gcc -c -o <tmp-main>.o main.c
$ gcc -c -o <tmp-three>.o three.c
$ gcc -c -o <tmp-two>.o two.c
$ gcc -c -o <tmp-one>.o one.c
$ gcc -o prog <tmp-main>.o <tmp-three>.o <tmp-two>.o <tmp-one>.o

Where:

  • The option -c means compile *.c source files to *.o object files; don't link them.

  • <tmp-filename>.o is a temporary file being the object file compiled from filename.c

  • The final command is the linkage of all the temporary object files into the executable prog.

That is what happens behind the scenes when you run:

$ gcc -o prog main.c three.c two.c one.c

And that is because in reality, C source files must be preprocessed and compiled to object files one at a time. The entire source input that goes to produce one object file, i.e. a *.c source file as recursively expanded by the preprocessor, is called a translation unit. By definition one translation unit yields one object file. Once a translation unit yields an object file the preprocessor and compiler have nothing further to do with it: the object file is consumed by the linker.

You could do all this explicitly yourself with:

$ gcc -c -o main.o main.c
$ gcc -c -o three.o three.c
$ gcc -c -o two.o two.c
$ gcc -c -o one.o one.c
$ gcc -o prog main.o three.o two.o one.o
$ ./prog
1 2 3

Each of the gcc -c commands in the pipeline sends no information to the next one. Only the final linkage command receives information from prior commands, in the form of the object files they have generated, and nothing else.

You can see then that:

$ gcc -c -o <tmp-two>.o two.c

does not have the information that

$ gcc -c -o <tmp-three>.o three.c

(or any other command) defined the header guard ONE_H in some translation unit on some occasion in the past, and this information is irrelevant to the command. The preprocessor includes the declaration of int one(void) from one.h because ONE_H is as yet undefined in this translation unit, and it defines ONE_H, to stop the guarded content of one.h from being being repeatedly included in the same translation unit.

Upvotes: 0

Vlad from Moscow
Vlad from Moscow

Reputation: 311088

You have three translation (or preprocessing translation) units:

  1. main.c with include directives that explicitly include A.h, B.h and D.h. In turn files A.h and B.h include the header D.h. So you have

    A.h D.h B.h D.h D.h main.c

The body of D.h enclosed in the directives

#ifndef HEADERNAME
#define HEADERNAME
//function declarations
#endif

is processed by the preprocessor only once because in the second and in the third included header D.h the name HEADERNAME is already defined.

  1. A.c with the header A.h that in turn includes the header D.h
  2. B.c with the header B.h that in turn includes the header D.h

So the code in A.c and B.c has access to declarations in the header D.h because the header is included in each translation unit (2 and 3).

From the C Standard (6.10.2 Source file inclusion):

2 A preprocessing directive of the form

# include < h-char-sequence > new-line

searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.

and (6.10.1 Conditional inclusion):

6 Each directive’s condition is checked in order. If it evaluates to false (zero), the group that it controls is skipped: directives are processed only through the name that determines the directive in order to keep track of the level of nested conditionals; the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens in the group. Only the first group whose control condition evaluates to true (nonzero) is processed; any following groups are skipped and their controlling directives are processed as if they were in a group that is skipped. If none of the conditions evaluates to true, and there is a #else directive, the group controlled by the #else is processed; lacking a #else directive, all the groups until the #endif are skipped.

Upvotes: 3

Related Questions