Reputation: 879
For the sake of specifics, let's consider GCC compiler, the latest version.
Consider the instruction int i = 7;
.
In assembly it will be something like
MOV 7, R1
This will insert the value seven to register R1. The exact instruction may not be important here.
In my understanding, now the compiler will convert the MOV
instruction to processor specific OPCODE. Then it will allocate a (possibly virtual) register. Then the constant value 7 needs to go in the register.
My question:
How does the 7 is actually converted to binary?
Does the compiler actually repeatedly divide by 2 to get the binary representation? (May be afterwards it will convert to HEX, but let's remain on the binary step).
Or, considering that the 7 is written as a character in a text file, is there a clever look up table based technique to convert any string (representing a number) to a binary value?
If the current GCC compiler uses built in function to convert a string 7 to a binary 0111, then how did the first compiler convert a text based string to a binary value?
Thank you.
Upvotes: 1
Views: 749
Reputation: 76
If the current GCC compiler uses built in function to convert a string 7 to a binary 0111, then how did the first compiler convert a text based string to a binary value?
This is egg chicken problem but to simply put these compilers are created step by step and at some point the compiler is written in its language such that c compiler is written by c etc.
Before to answer your question we should define what we mean by "compilation" or what compiler does. to simply put this compilation is a pipeline. Takes your high level code does some operations and generates an assembly code(specific to machine) and machine defined assembler takes your assembly code and converts it into a binary object file.
At the compiler level all they do is to create corresponding assembly format in a text file.
and assembler is another program that takes this text
file and converts it into "binary" format.
Assembler can be also written by c language here we also need a mapping i.e movl->(0000110101110...) but this one is binary not ascii. and we need to write this binary into a file as-is.
Converting numbers into binary format is also redundant because numbers are already in binary form when they are loaded into memory.
the question is how they are converted/placed in to memory is a problem of the loader program of the operating system which exceeds my knowledge.
Upvotes: 0
Reputation: 213822
How does the 7 is actually converted to binary?
First of all, there's a distinction between the binary base 2 number format and what professional programmers call "a binary executable", meaning generated machine code and most often expressed in hex for convenience. Addressing the latter meaning:
Disassemble with binaries (for example at https://godbolt.org/) and see for yourself
int main (void)
{
int i = 7;
return i;
}
Does indeed get translated to something like
mov eax,0x7
ret
Translated to binary op codes:
B8 07 00 00 00
C3
Where B8 = mov eax
, B9 = mov ecx
and so on. The 7 is translated into 07 00 00 00
since mov
expects 4 bytes and this is a little endian CPU.
And this is the point where the compiler/linker stops caring. The code was generated according to the CPU's ABI (Application Binary Interface) and how to deal with this machine code from here on is up to the CPU.
As for how this makes it into the hardware in the actual form of base 2 binary... it's already in that form. Everything we see in a PC is a translated convenience for the human users, who have an easier time reading decimal or hex than raw binary.
Upvotes: 2