Reputation: 481
From what I understand when you compile a program (let's say a C program for example), the Compiler takes your code and outputs a executable program in binary (i.e. machine code for the targeted arch) format.
Within this binary you're going to have instructions that point to addresses in memory to load data/instructions from other parts of the program.
Given this program will be loaded into memory at some arbitrary location, how does the program know what these memory addresses are? How are they set/calculated and who's job is it to do this?
For example, does the binary just have placeholders for the memory locations that are replaced by the OS when it loads it into memory for the first time?
If it needs to dynamically load a shared library how does it work out where the memory location is for that?
How does 'virtual memory' come into play with this? (if at all)
Upvotes: 0
Views: 1349
Reputation: 71616
MMUs allow the OS to create the same address space (think addresses zero to N) for each application such that each application can be compiled for a known address space. There isn't much need for relocation in this situation. Even in the DOS days you could/would have a fixed offset relative to some data segment so that the applications could have an assumed address space.
The kernel bootstrap for Linux is a place where you will see relocation but the kernel itself not so much or perhaps that has changed in the last so many years.
Loadable modules and shared libraries would be one place where you might see relocation required. For at least the popular processors running the popular operating systems (Linux, Windows, macOS, arm, x86, mips) the code itself can be built to be relocatable without modification so long as it is all relative to itself, which is what is assumed.
Data relative to code though if you want to move the data then some form of table is typical, where the table is fixed relative to the code (or some other linked mechanism), but it contains information to tell where the data starts, or specific items/markers in the data start so that other data references can be relative to that.
Upvotes: 1
Reputation: 5805
how does the program know what these memory addresses are?
The program (and its author) does not know what the memory address will be when it's loaded to computer memory, it only knows where the placeholder is, relative to the start of its segment. That's why the compiler accompanies each such placeholder with relocation record. Relocation is a piece of information which tells the OS or the linker
Consider the following simple piece or source code of Windows Portable executable program:
[.text]
Main:NOP
LEA ESI,[Mem]
; more instructions
[.data]
DB "Some data"
Mem: DB "Other data"
which will be converted to machine instructions and memory data:
|[.text] |[.text]
|00000000:90 |Main:NOP
|00000001:8D35[09000000] | LEA ESI,[Mem]
|00000007: | ; more instructions
|[.data] |[.data]
|00000000:536F6D6520646174~| DB "Some data"
|00000009:4F74686572206461~|Mem: DB "Other data"
Compiler does not know the virtual address of Mem
, it only knows that it is located 0x00000009
bytes from the start of .data
segment, so it will put this temporary number into operation code of LEA ESI,[Mem]
and creates relocation of the placeholder (located in segment .text
at offset 0x00000003
) which is relative to segment .data
.
At link-time the linker decides that .text
segment will be loaded at virtual address 0x00401000
and .data
segment at VA 0x00402000
. Linker then reads the relocation record and modifies the placeholder by adding 0x00402000
. Instruction LEA ESI,[Mem]
in the linked executable then will be 8D3509204000
, which is the final fixed-up virtual address of Mem
. We'll be able to see that address in debugger at run-time.
Relocations are present in linked executable files, too (16bit DOS MZ or Windows PE), for the case that they could not be loaded at the virtual imagebase address assumed at link time. With linking SO libraries in Linux it is more complicated, see chapter 2 Dynamic linking in http://www.skyfree.org/linux/references/ELF_Format.pdf
Upvotes: 4