Armen Michaeli
Armen Michaeli

Reputation: 9139

Is it possible to force say, `gcc`, to produce identical binaries for identical pieces of source code in C and C++?

Say one has a C program like the following:

int main(int argc, char** argv)
{
    return 0;
}

I have two files with the source code above, one with 'c' extension, another with 'cpp'. I compile them as C and C++ programs, respectively. The binaries are different. I thought C++ was a "zero overhead" language? :-) What I am trying to find is compiler flags for two different setups where the resulting binaries are the same. Preferably some kind language standard, not GCC extensions of any kind.

Upvotes: 0

Views: 743

Answers (3)

josefx
josefx

Reputation: 15656

Of course they are different, even simple things like function names are handled differently by c and c++. void foo() in c is simply foo in c++ that name gets mangled since the c version does not contain enough information to deal with multiple foo functions with different parameter lists like void foo(int).

Then there are different standard libraries which are linked in by default since they are used by most c/c++ programs (for your zero overhead claim this can be disabled ).

Most important are different rules concerning well defined behavior, c++ is not a superset of c and while there is a large amount of overlap there are many cases where they differ. See for example sizeof('a') in c and c++.

Conclusion: c and c++ compilers producing identical binaries from identical source code while possible is extremely unlikely to happen.

Upvotes: 1

Jonathan Wakely
Jonathan Wakely

Reputation: 171253

The binaries are different.

How are they different?

GCC will embed information into the file about the original source filenames and options used, so will always have some differences for different filenames even of the content is identical.

If I compile your program as C and C++ the only difference I see is that the C++ version is linked to libstdc++, which happens automatically when using g++ to link. If I instead use gcc to link then the binaries are almost identical.

N.B. you can use gcc to compile C++ programs, the gcc and g++ binaries are just drivers that look at the filename and invoke the correct compiler binary (cc1 for C, cc1plus for C++) to do the actual compilation. See http://gcc.gnu.org/onlinedocs/gcc/Invoking-G_002b_002b.html for more details.

This shows that for identical source code the only difference in the assembler output is the string giving the original filenames, and the object files are the same size:

$ cat f.c
cat: f.c: No such file or directory
$ rm f.c
$ cat > f.c
int main(int argc, char** argv)
{
    return 0;
}
$ ln -s f.c f.cc
$ gcc f.c -S -o f.c.s
$ g++ f.cc -S -o f.cxx.s
$ diff f.c*.s
--- f.c.s       2012-08-26 13:45:58.109711329 +0100
+++ f.cxx.s     2012-08-26 13:46:00.482634256 +0100
@@ -1,4 +1,4 @@
-       .file   "f.c"
+       .file   "f.cc"
        .text
        .globl  main
        .type   main, @function
$ gcc f.c -c -o f.c.o
$ g++ f.cc -c -o f.cxx.o
$ ls -l f.c*.o
-rw-rw-r--. 1 jwakely users 1240 Aug 26 13:46 f.c.o
-rw-rw-r--. 1 jwakely users 1240 Aug 26 13:46 f.cxx.o

And in the final executable the difference comes from how it's linked, whether the C++ standard library is linked to or not:

$ gcc f.c.o -o a.c.out
$ gcc f.cxx.o -o a.cxx.out
$ g++ f.cxx.o -o a.cxx.libstdcxx.out
$ ls -l a.c*.out
-rwxrwxr-x. 1 jwakely users 6323 Aug 26 13:48 a.c.out
-rwxrwxr-x. 1 jwakely users 6468 Aug 26 13:48 a.cxx.libstdcxx.out
-rwxrwxr-x. 1 jwakely users 6324 Aug 26 13:48 a.cxx.out

If you don't need the C++ standard library, don't link to it.

Upvotes: 6

Emilio Garavaglia
Emilio Garavaglia

Reputation: 20730

This is just another version of "how short is my empty main" false problem.

There are part of "infrastructure" that carries the startup and ending code as well as standard library global objects that must be linked in any case, whatever the program actually is.

Measuring an empty main program, is in fact measuring the size of the startup/ending code. That must be different in C and C++ since C++ has much more things to do to prepare to call main than C actually has to do.

I don't know what you mean by "zero overhead language". Neither C and C++ are. They both minimize the overhead in their respective domains. The only zero overhead language is - by definition - the native machine code.

Upvotes: 2

Related Questions