Reputation: 913
as the title says : Why does the compiler only generate object files .o only from .cpp files not header files ? and how the linker knows how to link object files together if the implementation is in .h files ?
Upvotes: 1
Views: 4858
Reputation: 61452
Why does the compiler only generate object files .o only from .cpp files not header files
For concreteness, I'll assume the compiler is GCC's C++ compiler.
The compiler will compile a header file to an object file, if you make clear that's what you really want.
header.h
#ifndef HEADER_H
#define HEADER_H
#include <iostream>
inline void hw()
{
std::cout << "Hello World" << std::endl;
}
#endif
If you simply do:
$ g++ header.h
then it won't generate an object file, because it assumes from the .h
extension that you don't want it to. Instead, it will generate
a precompiled header file,
header.h.gch
.
This is a reasonable assumption because usually we don't want to compile a header file directly to an object file. Usually, we don't want to compile a header file directly at all, and if we do, what we want is a pre-compiled header file.
But if you actually do want header.h
compiled to header.o
, you can insist
on it like this:
$ g++ -c -x c++ header.h
which says: Compile, without linking, header.h
, treating it as a C++ source file.
And the output is header.o
.
This header.o
is pretty useless, however. It does not, for instance, export the
solitary function hw
to the linker, because the function is inlined. If we
look at the symbols in the object file:
$ objdump -C -t header.o
header.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 header.h
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l O .bss 0000000000000001 std::__ioinit
0000000000000000 l F .text 000000000000003e __static_initialization_and_destruction_0(int, int)
000000000000003e l F .text 0000000000000015 _GLOBAL__sub_I_header.h
0000000000000000 l d .init_array 0000000000000000 .init_array
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 *UND* 0000000000000000 std::ios_base::Init::Init()
0000000000000000 *UND* 0000000000000000 .hidden __dso_handle
0000000000000000 *UND* 0000000000000000 std::ios_base::Init::~Init()
0000000000000000 *UND* 0000000000000000 __cxa_atexit
we see there's nothing there but boilerplate and things pulled in by #include <iostream>
.
We could make header.h
serviceable for linkage by deleting the keyword inline
.
Then if we recompile as before and have another look:
$ objdump -C -t header.o | grep hw
0000000000000061 l F .text 0000000000000015 _GLOBAL__sub_I__Z2hwv
0000000000000000 g F .text 0000000000000023 hw()
We've exported hw()
! We can link header.o
in a program!
main.cpp
extern void hw();
int main()
{
hw();
return 0;
}
Compile that:
$ g++ -c main.cpp
Link:
$ g++ -o prog main.o header.o
Run:
$ ./prog
Hello World
But there's a snag. Now that we've defined hw()
in header.h
so that
the linker can see it, we can't use header.h
in the way that header files
are normally used any more, i.e. we can't #include "header.h"
in more than
one .cpp
file that are compiled and linked together in the same program:
main1.cpp
extern void foo();
extern void bar();
int main()
{
foo();
bar();
return 0;
}
foo.cpp
#include "header.h"
void foo(){
hw();
};
bar.cpp
#include "header.h"
void bar(){
hw();
};
Compile them all:
$ g++ -c main1.cpp foo.cpp bar.cpp
All good. So link:
% g++ -o prog main1.o foo.o bar.o
bar.o: In function `hw()':
bar.cpp:(.text+0x0): multiple definition of `hw()'
foo.o:foo.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
No good, because hw()
is defined twice, once in foo.o
and again in bar.o
, and that's a linkage error: the
linker can't pick one definition rather than the other.
So you see that the compiler is willing and able to compile a .h
file as a C++ source file if you insist; it's able and willing to
compile a .blahblah
file as a C++ source if you insist,
assumimg there is legal C++ in the .blahblah
file. But a header
file compiled to an object file is of little or no use to us.
The distinction between .h
file and .cpp
file is just a conventional
distinction as to how we intend the file to be used. If we
give a .h
extension we are saying: All the C++ in this file can safely be
included in multiple translation units (.cpp
files) that are compiled and linked
together. If it we give it a .cpp
extension we are saying: At least some of the C++ in this
file can only be compiled and linked once in the same linkage.
The header.h
that we started with was a proper header file, according to this
convention. The header.h
from which we deleted inline
was no longer a header
file according to this convention. We should have renamed it to something .cpp
,
if we don't just like confusing people.
How the linker knows how to link object files together if the implementation is in .h files
The linker links nothing but object files and libraries. It doesn't know anything
about .cpp
files or .h
files: they might as well not exist as far as the linker
is concerned. There are three ways that "implementation" in a header file can get to
the linker.
1) The unconventional way that we just discussed: by compiling the header file to an object file, which is linked. As you've seen, there is no technical problem in doing that, though in practice it's never done.
2) The usual way, by #include
-ing the header file in a .cpp
file.
hello.h
#ifndef HELLO_H
#define HELLO_H
static char const * hw = "Hello world";
#endif
hello.cpp
#include "hello.h"
char const * hello = hw;
In this case, the compiler preprocesses hello.cpp
before it even starts to generate object code, and you can see what the compiler sees after
the preprocessor is finished by telling the compiler to do the preprocessing and nothing else:
$ g++ -P -E hello.cpp
static char const * hw = "Hello world";
char const * hello = hw;
The output of that command is the translation unit that will get compiled into
hello.o
, and as you see, the code in hello.h
is simply copied into the
translation unit in the place of #include "hello.h"
.
So by the time the compiler starts to generate hello.o
, the header hello.h
is irrelevant: it might as well not exist.
3) By compiling the header.h
file into a pre-compiled header.h.gch
. The
header.h.gch
is a "semi-compiled" form of header.h
that will be #include
-ed,
if it exists, whenever #include "header.h"
or #include <header.h>
appears
in the code. The only difference is that the semi-compiled header.h.gch
can
be processed faster than header.h
: (3) is just a faster version of (2) (and it has the limitation that the compiler will only accept one precompiled header per compilation.)
Whether it gets there by (1),(2) or (3), the linkage of code
from a .h
file is no different from the linkage of code from a .cpp
file.
All code is compiled by the compiler. The compiler doesn't care whether code
originates in a .h
file or .cpp
file. The compiler generates object files,
and the linker links the object files.
Upvotes: 5