What does a gcc output file look like and what exactly does it contain?

Question

While compiling a c file, gcc by default compiles it to a file called "a.out". My professor said that the output file contains the binaries, but I when I open it I usually encounter unreadable text (VS Code says something like "This file contains unsupported text encoding").
I assumed that by 'binaries', I would be able to see literal zeroes and ones in the file but that does not seem to be the case. So what exactly does it output file look like or what exactly does it contain and what is 'text encoding'? Why can I not read it? What special characters might it contain? I'm aware of the fact that gcc first pre-processes, which means it removes all comments, expands all macros and copies the contents of any header files that might be included. You get the header file by running gcc -E .c, then the this processed file is complied into assembly. Up to this point, the output files are readable, i.e., I can open them with VS Code, but after this the assembled code and the object file thereafter are human-unreadable.

For reference, I have no prior experience with programming or any language for that matter and this is my first CS related course in my first sem of college, and I apologize if this is too trivial of a question to ask.

bolov · Accepted Answer

I actually had the same confusion early on. Not about that file type specifically, but about binary vs text files.

After all aren't all files, even text ones binary? In the sense that all information is 1s and 0s? Well, yes, all information can be stored/transmitted as 1s and 0s, but that's not what binary/text files refer to.

It refers to what that information, the content of the file, those 1s and 0s represent.

In a text file the bytes encode characters. In a binary file the bits encode some information that is not text. The format and semantics of that information is completely free, it can mean anything and use whatever encoding scheme. It's up to the application that writes/reads the file to properly understand the bit patterns.

Most text editors (like VS Code) when open a file they treat it as a text file. I.e. they try to interpret the bit patterns as a text encoding scheme (e.g. ASCII or UTF-8) But not all bit patterns are valid ASCII/UTF-8 so that's why you get "unsupported text encoding".

If you want to inspect the actual 1s and 0 for both text and binary files you need to use a utility that shows you that, e.g. hex viewers/editors.

What does a gcc output file look like and what exactly does it contain?

Answers (1)

Related Questions