Dan
Dan

Reputation: 2876

How does an OS such as Linux load executables into virtual memory?

I have read this statement from this link:

Executable Object Files and Virtual Memory

When an executable file is started, the OS (kernel) creates a virtual address space and an (initially empty) process, and examines the executable file's header.

But I don't understand how does examines the executable file's header happen?

Shoudn't the binary be loaded in memory to begin with, before the OS can examine the executable file's header? CPU can't run instructions directly on HDD.

I'm guessing the Loader should be able to see the addresses allocated to the binary during compilation and map them to the newly created virtual memory.

Also in case the binary is loaded by the OS, doe sit get loaded entirely? or it does lazy loading and loads pages as needed afterwords. How much would it load initially?

Upvotes: 2

Views: 1111

Answers (1)

Nate Eldredge
Nate Eldredge

Reputation: 58132

Shoudn't the binary be loaded in memory to begin with, before the OS can examine the executable file's header?

Well, only the header of the binary has to be loaded into memory for this step. The kernel loads the header and inspects it to see how to set up mappings for the various sections of the binary. The header might say, for instance, "map bytes 4096-65535 of the binary into memory at address 0x12345000, read-only and executable"; "map 16384 bytes of zero-initialized memory at address 0xdeadf000, read-write", and so on. After these mappings are set up, the kernel doesn't need to keep the binary's header in memory anymore, and can free that space.

Also in case the binary is loaded by the OS, does it get loaded entirely?

No.

or it does lazy loading and loads pages as needed afterwards.

Yes.

How much would it load initially?

Potentially none at all. It can rely instead on the page fault handler to do it when the process actually accesses the memory. In that case, the sysret, or whatever instruction the kernel uses to transfer control to the program's entry point, would itself cause a page fault, at which point the page containing the first instruction at the entry point would be loaded from the binary as specified by the mapping for that address. When the fault handler returned, that first instruction would be in memory and would be executed. As the process executes more instructions touching more memory, more and more of its pages will be loaded.

The kernel could, as an optimization, prefault some of these pages into memory, based on guesses as to which ones are likely to be accessed in the near future. I don't know exactly to what extent this is done.

Upvotes: 4

Related Questions