MysteriousWaffle
MysteriousWaffle

Reputation: 481

How do memory addresses in binary programs point to the right place in memory at runtime?

From what I understand when you compile a program (let's say a C program for example), the Compiler takes your code and outputs a executable program in binary (i.e. machine code for the targeted arch) format.

Within this binary you're going to have instructions that point to addresses in memory to load data/instructions from other parts of the program.

Given this program will be loaded into memory at some arbitrary location, how does the program know what these memory addresses are? How are they set/calculated and who's job is it to do this?

For example, does the binary just have placeholders for the memory locations that are replaced by the OS when it loads it into memory for the first time?

If it needs to dynamically load a shared library how does it work out where the memory location is for that?

How does 'virtual memory' come into play with this? (if at all)

Upvotes: 0

Views: 1349

Answers (2)

old_timer
old_timer

Reputation: 71616

MMUs allow the OS to create the same address space (think addresses zero to N) for each application such that each application can be compiled for a known address space. There isn't much need for relocation in this situation. Even in the DOS days you could/would have a fixed offset relative to some data segment so that the applications could have an assumed address space.

The kernel bootstrap for Linux is a place where you will see relocation but the kernel itself not so much or perhaps that has changed in the last so many years.

Loadable modules and shared libraries would be one place where you might see relocation required. For at least the popular processors running the popular operating systems (Linux, Windows, macOS, arm, x86, mips) the code itself can be built to be relocatable without modification so long as it is all relative to itself, which is what is assumed.

Data relative to code though if you want to move the data then some form of table is typical, where the table is fixed relative to the code (or some other linked mechanism), but it contains information to tell where the data starts, or specific items/markers in the data start so that other data references can be relative to that.

Upvotes: 1

vitsoft
vitsoft

Reputation: 5805

how does the program know what these memory addresses are?

The program (and its author) does not know what the memory address will be when it's loaded to computer memory, it only knows where the placeholder is, relative to the start of its segment. That's why the compiler accompanies each such placeholder with relocation record. Relocation is a piece of information which tells the OS or the linker

  1. where the relocated address is (its offset in code or data segment)
  2. which segment it is in
  3. which segment or symbol it refers
  4. what kind of relocation should apply on the address

Consider the following simple piece or source code of Windows Portable executable program:

[.text]
Main:NOP
     LEA ESI,[Mem]
     ; more instructions 
[.data]
     DB "Some data"
Mem: DB "Other data"

which will be converted to machine instructions and memory data:

|[.text]                   |[.text]
|00000000:90               |Main:NOP
|00000001:8D35[09000000]   |     LEA ESI,[Mem]
|00000007:                 |     ; more instructions
|[.data]                   |[.data]
|00000000:536F6D6520646174~|     DB "Some data"
|00000009:4F74686572206461~|Mem: DB "Other data"

Compiler does not know the virtual address of Mem, it only knows that it is located 0x00000009 bytes from the start of .data segment, so it will put this temporary number into operation code of LEA ESI,[Mem] and creates relocation of the placeholder (located in segment .text at offset 0x00000003) which is relative to segment .data.

At link-time the linker decides that .text segment will be loaded at virtual address 0x00401000 and .data segment at VA 0x00402000. Linker then reads the relocation record and modifies the placeholder by adding 0x00402000. Instruction LEA ESI,[Mem] in the linked executable then will be 8D3509204000, which is the final fixed-up virtual address of Mem. We'll be able to see that address in debugger at run-time.

Relocations are present in linked executable files, too (16bit DOS MZ or Windows PE), for the case that they could not be loaded at the virtual imagebase address assumed at link time. With linking SO libraries in Linux it is more complicated, see chapter 2 Dynamic linking in http://www.skyfree.org/linux/references/ELF_Format.pdf

Upvotes: 4

Related Questions