CppLearner
CppLearner

Reputation: 17040

Putting CPU and Memory Management model all together

WARNING: This is long but I hope it can be useful for people like me in the future.

I think I know what program counter is, how lazy memory allocation works, what MMU does, how virtual memory address is mapped to physical address and the purpose of L1, L2 caches. What I really have trouble with is is how they all fit together in a high level when we run a C code.

Suppose I have this C code:

#include <stdio.h>
#include <stdlib.h>
int main()
{
    int* ptr;
    int n = 1000000, i = 0;

    // Dynamically allocate memory using malloc()
    ptr = (int*)malloc(n * sizeof(int));

    ptr[0] = 99;
    i += 100;
    printf("%d\n", ptr[0]);
    free(ptr);
    return 0;
}

So here is my attempt to put everything together:

  1. After execve() is called, part of the executable is loaded into the memory, e.g. text and data segment, but most of the code are not - they are loaded on demand (demand paging).

  2. The address of the first instruction is in the process table's program counter (PC) field as well as physically in the PC register, ready to be used.

  3. As the CPU executes instructions, PC is updated (usually +1, but jump can go to a different address).

  4. Enter the main function: ptr, n, and i are in the stack.

  5. Next, when we call malloc, the C library will ask the OS (I think via sbrk() sys call, or is it mmap()?) to allocate some memory on the heap.

  6. malloc succeeds in this case, returning a virtual memory address (VMA), but the physically memory may not have been allocated yet. The page table doesn't contain the VMA, so when CPU tries to access such VMA, a page fault will be generated.

  7. In our case, when we do ptr[0] = 99, CPU raises a page fault. I am not sure if the entire array is allocated or just the first page (4k size) though.

But now I don't know how to put cache access into the picture. How does i put into L1 cache? How does it relate to VMA?

Sorry if this is confusing. I just hope someone could help walk me through the entire process...

Upvotes: 4

Views: 326

Answers (1)

Erik Eidt
Erik Eidt

Reputation: 26636

Before the program runs, the operating system and the C runtime setup the necessary values in the CPU registers.

As you've already noted, the intended PC value is set by the operating system (e.g. by the loader) and then the CPU's PC (aka IP) register is set, probably with a "return from interrupt" instruction that both switches to user mode (activating the virtual memory map for that process) along with loading the CPU with the proper PC value (a virtual address).

In addition, the SP register is set somehow: in some systems this will be done similar to the PC during the "return from interrupt", but in other (older) systems the user code sets the SP to a prearranged location.  In either case the SP also holds a virtual memory address.

Usually the first instruction in that runs in the user process is in a routine traditionally called _start in a library called crt0 (C RunTime 0 (aka startup)).  _start is usually written in assembly and handles the transition from the operating system to user mode.  As needed _start will establish anything else necessary for C code to be called, and then, call main.  If main returns to _start, it will do an exit syscall.

The CPU caches (and probably TLBs) will be cold when _start's first instruction gets control.  All addresses in user mode are virtual memory addresses that designate memory within the (virtual) address space of the process.  The processor is running in user mode.  Probably the operating system has preloaded the page holding _start (or a least the start of _start).  So when the processor performs an instruction fetch from _start, it will probably TLB miss, but not page fault, and then cache miss.

The TLB is a set of registers forming a cache in the CPU that support virtual to physical address translations/mappings.  The TLB, when it misses, will be loaded from a structure in the virtual memory mapping for the process, such as the page tables.  Since that first page is preloaded, the attempt to map will succeed, and the TLB will then be filled with the proper mappings from the virtual PC page to the physical page.  However, the L1/L2, etc.. caches are also cold, so the access next causes a cache miss.  The memory system will satisfy the cache miss by filling a cache line at each level.  Finally an instruction word or group of words is provided to the processor, and it begins executing instructions.

If a virtual address for code (by way of the PC) or data (by some dereference) is not present in the TLB, then the processor will consult the page tables, and a miss there can cause a recoverable or non-recoverable page fault.  Recoverable page faults are virtual to physical mappings that are not present in the page tables, because the data is on disc and operating system intervention is required; whereas non-recoverable faults are accesses to virtual memory that are in error, i.e. not allowed as they refer to virtual memory that has not been allocated/authorized by the operating system.

Variable i is known to main as a stack-relative location.  So, when main wants to write to i it will write to memory and an offset from SP, e.g. SP+8 (i could also be a register variable, but I digress).  Since the SP is a pointer holding a virtual memory address, i then has a virtual address.  That virtual address goes thru the above described steps: TLB mapping from virtual page to physical page, possible page faulting, and then possible cache miss.  Subsequent access will yield TLB hits, and cache hits, so as to run at full speed.  (The operating system will probably also preload some but not all stack pages before running the process.)

A malloc operation will use some system calls that ultimately cause additional virtual memory to be added to the process.  (Though as you also note, malloc gets more than enough for the current request so the system calls are not done every malloc.)  malloc will return a virtual memory address, i.e. a pointer in the user mode virtual address space.  For memory just obtained by a system call, the TLB and caches are also probably code, and it is possible that the page is not even loaded yet as well.  In the latter case, a recoverable page fault will happen and the OS will allocate a physical page to use.  If the OS is smart it will know that this is a new data page, and so can fill it with zeros instead of loading it from the paging file.  Then it will set up the page table entries for the proper mapping, and resume the user process, which will probably then TLB miss, fill a TLB entry from the page tables, and then cache miss, and fill cache lines from the physical page.

Upvotes: 3

Related Questions