Somatic
Somatic

Reputation: 193

Is there a reason even my tiniest .c files always compile to at least 128-kilobyte executables?

I am using Dev-C++, which compiles using GCC, on Windows 8.1, 64-bit.

I noticed that all my .c files always compiled to at least 128-kilobyte .exe files, no matter how small the source is. Even a simple "Hello, world!" was 128kb. Source files with more lines of code increased the size of the executable as I would expect, but all the files started off at at least 128kb, as if that's some sort of minimum size.

I know .exe's don't actually have a minimum size like that; .kkrieger is a full first-person shooter with 3d graphics and sound that all fit inside a single 96kb executable.

Trying to get to the bottom of this, I opened up my hello_world.exe in Notepad++. Perhaps my compiler adds a lengthy header that happens to be 128kb, I thought.

Unfortunately, I don't know enough about executables to be able to make sense of it, though I did find strings like "Address %p has no image-section VirtualQuery failed for %d bytes at address %p" buried among the usual garble of characters in an .exe.

Of course, this isn't a serious problem, but I'd like to know why it's happening.

Why is this 128kb minimum happening? Does it have something to do with my 64-bit OS, or perhaps with a quirk of my compiler?

Upvotes: 2

Views: 388

Answers (1)

Mason Watmough
Mason Watmough

Reputation: 495

Short answer: it depends.

Long answer: it depends on what operating system you have and how it handles executables.

Most (if not all) compilers of programming languages do not break it down to the absolute, raw x86/ARM/other architecture's machine code. Instead, after they pack your source code into a .o (object) file, they then bring the .o and its libraries and "link" it all together, in such a way that it forms a standard executable format. These "executable formats" are essentially system-specific file formats that contain low level, very-close-to-machine-code instructions that the OS interprets in such a way that it can relay those low-level instructions to the CPU in the form of machine-code instructions.

For example, I'll talk about the two most commonly used executable formats for Linux devices: ELF and ELF64 (I'll let you figure out what the namesake differences are yourself). ELF stands for Executable and Linkable Format. In every ELF-compiled program, the file starts off with a 4-byte "magic number", which is simply a hexadecimal 0x7F followed by the string "ELF" in ASCII. The next byte is set to either 1 or 2, which signifies that the program is for 32-bit or 64-bit architectures, respectively. And after that, another byte to signify the program's endianness. After that, there's a few more bytes that tell what the architecture is, and so on, until you reach a total of up to 64 bytes for the 64-bit header.

However, 64 bytes is not even close to the 128K that you have stated. That's because (aside from the fact that the windows .exe format is usually much more complex), there is the C++ standard library at fault here. For instance, let's have a look at a common use of the C++ iostream library:

#include <iostream>
int main()
{
    std::cout<<"Hello, World!"<<std::endl;
    return 0;
}

This program may compile to an extremely large executable on a windows system, because the moment you add iostream to your program, it adds the entire C++ standard library into it, increasing your executable's size immensely.

So, how do we rectify this problem? Simple: Use the C standard library implementation for C++!

#include <cstdio>
int main()
{
    printf("Hello, World!\n");
    return 0;
}

Simply using the original C standard library can decrease your size from a couple hundred KBytes to a handful at most. The reason that this happens is simply because GCC/G++ really likes linking programs with the entire standard C++ library for some odd reason.

However, sometimes you absolutely need to use the C++-specific libraries. In that case,a lot of linkers have some kind of command-line option that essentially tells the linker "Hey, I'm only using like, 2 functions from the STDCPP library, you don't need the whole thing". On the Linux linker ld, this is the command-line option -nodefaultlibs. I'm not entirely sure what this is on windows, though. Of course, this can very quickly break a TON of calls and such in programs that make a lot of standard C++ calls.

So, in the end, I would worry more about simply re-writing your program to use the regular C functions instead of the new-fangled C++ functions, as amazing as they are. that is if you're worried about size.

Upvotes: 2

Related Questions