Why does the compiler only generate object files .o only from .cpp files

Question

as the title says : Why does the compiler only generate object files .o only from .cpp files not header files ? and how the linker knows how to link object files together if the implementation is in .h files ?

Mike Kinghan · Accepted Answer

Why does the compiler only generate object files .o only from .cpp files not header files

For concreteness, I'll assume the compiler is GCC's C++ compiler.

The compiler will compile a header file to an object file, if you make clear that's what you really want.

header.h

#ifndef HEADER_H
#define HEADER_H

#include 

inline void hw()
{
    std::cout << "Hello World" << std::endl;
}

#endif

If you simply do:

$ g++ header.h

then it won't generate an object file, because it assumes from the .h extension that you don't want it to. Instead, it will generate a precompiled header file, header.h.gch.

This is a reasonable assumption because usually we don't want to compile a header file directly to an object file. Usually, we don't want to compile a header file directly at all, and if we do, what we want is a pre-compiled header file.

But if you actually do want header.h compiled to header.o, you can insist on it like this:

$ g++ -c -x c++ header.h

which says: Compile, without linking, header.h, treating it as a C++ source file. And the output is header.o.

This header.o is pretty useless, however. It does not, for instance, export the solitary function hw to the linker, because the function is inlined. If we look at the symbols in the object file:

$ objdump -C -t header.o

header.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 header.h
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l     O .bss   0000000000000001 std::__ioinit
0000000000000000 l     F .text  000000000000003e __static_initialization_and_destruction_0(int, int)
000000000000003e l     F .text  0000000000000015 _GLOBAL__sub_I_header.h
0000000000000000 l    d  .init_array    0000000000000000 .init_array
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000         *UND*  0000000000000000 std::ios_base::Init::Init()
0000000000000000         *UND*  0000000000000000 .hidden __dso_handle
0000000000000000         *UND*  0000000000000000 std::ios_base::Init::~Init()
0000000000000000         *UND*  0000000000000000 __cxa_atexit

we see there's nothing there but boilerplate and things pulled in by #include .

We could make header.h serviceable for linkage by deleting the keyword inline. Then if we recompile as before and have another look:

$ objdump -C -t header.o | grep hw
0000000000000061 l     F .text  0000000000000015 _GLOBAL__sub_I__Z2hwv
0000000000000000 g     F .text  0000000000000023 hw()

We've exported hw()! We can link header.o in a program!

main.cpp

extern void hw();

int main()
{
    hw();
    return 0;
}

Compile that:

$ g++ -c main.cpp

Link:

$ g++ -o prog main.o header.o

Run:

$ ./prog
Hello World

But there's a snag. Now that we've defined hw() in header.h so that the linker can see it, we can't use header.h in the way that header files are normally used any more, i.e. we can't #include "header.h" in more than one .cpp file that are compiled and linked together in the same program:

main1.cpp

extern void foo();
extern void bar();

int main()
{
  foo();
  bar();
  return 0;
}

foo.cpp

#include "header.h"

void foo(){
    hw();
};

bar.cpp

#include "header.h"

void bar(){
    hw();
};

Compile them all:

$ g++ -c main1.cpp foo.cpp bar.cpp

All good. So link:

% g++ -o prog main1.o foo.o bar.o
bar.o: In function `hw()':
bar.cpp:(.text+0x0): multiple definition of `hw()'
foo.o:foo.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

No good, because hw() is defined twice, once in foo.o and again in bar.o, and that's a linkage error: the linker can't pick one definition rather than the other.

So you see that the compiler is willing and able to compile a .h file as a C++ source file if you insist; it's able and willing to compile a .blahblah file as a C++ source if you insist, assumimg there is legal C++ in the .blahblah file. But a header file compiled to an object file is of little or no use to us.

The distinction between .h file and .cpp file is just a conventional distinction as to how we intend the file to be used. If we give a .h extension we are saying: All the C++ in this file can safely be included in multiple translation units (.cpp files) that are compiled and linked together. If it we give it a .cpp extension we are saying: At least some of the C++ in this file can only be compiled and linked once in the same linkage.

The header.h that we started with was a proper header file, according to this convention. The header.h from which we deleted inline was no longer a header file according to this convention. We should have renamed it to something .cpp, if we don't just like confusing people.

How the linker knows how to link object files together if the implementation is in .h files

The linker links nothing but object files and libraries. It doesn't know anything about .cpp files or .h files: they might as well not exist as far as the linker is concerned. There are three ways that "implementation" in a header file can get to the linker.

1) The unconventional way that we just discussed: by compiling the header file to an object file, which is linked. As you've seen, there is no technical problem in doing that, though in practice it's never done.

2) The usual way, by #include-ing the header file in a .cpp file.

hello.h

#ifndef HELLO_H
#define HELLO_H

static char const * hw = "Hello world";

#endif

hello.cpp

#include "hello.h"
char const * hello = hw;

In this case, the compiler preprocesses hello.cpp before it even starts to generate object code, and you can see what the compiler sees after the preprocessor is finished by telling the compiler to do the preprocessing and nothing else:

$ g++ -P -E hello.cpp
static char const * hw = "Hello world";
char const * hello = hw;

The output of that command is the translation unit that will get compiled into hello.o, and as you see, the code in hello.h is simply copied into the translation unit in the place of #include "hello.h".

So by the time the compiler starts to generate hello.o, the header hello.h is irrelevant: it might as well not exist.

3) By compiling the header.h file into a pre-compiled header.h.gch. The header.h.gch is a "semi-compiled" form of header.h that will be #include-ed, if it exists, whenever #include "header.h" or #include appears in the code. The only difference is that the semi-compiled header.h.gch can be processed faster than header.h: (3) is just a faster version of (2) (and it has the limitation that the compiler will only accept one precompiled header per compilation.)

Whether it gets there by (1),(2) or (3), the linkage of code from a .h file is no different from the linkage of code from a .cpp file. All code is compiled by the compiler. The compiler doesn't care whether code originates in a .h file or .cpp file. The compiler generates object files, and the linker links the object files.

Why does the compiler only generate object files .o only from .cpp files

Answers (1)

Related Questions