Are Qemu guest memory accesses trapped somewhere in the code?

Question

According to https://qemu.readthedocs.io/en/latest/devel/memory.html#visibility :

The memory core uses the following rules to select a memory region when the guest accesses an address:

all direct subregions of the root region are matched against the address, in descending priority order

if the address lies outside the region offset/size, the subregion is discarded

if the subregion is a leaf (RAM or MMIO), the search terminates, returning this leaf region

if the subregion is a container, the same algorithm is used within the subregion (after the address is adjusted by the subregion offset)

if the subregion is an alias, the search is continued at the alias target (after the address is adjusted by the subregion offset and alias offset)

if a recursive search within a container or alias subregion does not find a match (because of a “hole” in the container’s coverage of its address range), then if this is a container with its own MMIO or RAM backing the search terminates, returning the container itself. Otherwise we continue with the next subregion in priority order

if none of the subregions match the address then the search terminates with no match found

Does this process happen on every memory access by the guest? If so, where is this logic in the Qemu codebase, roughly?

Peter Maydell · Accepted Answer

No, it doesn't happen for every access; the documented rules above describe the observed behaviour rather than the implementation, because their assumed audience is a developer writing a board or SoC model, who doesn't need to know the internal implementation details of how exactly the memory subsystem uses the tree of MemoryRegions that the board and SoC code creates.

Firstly, once the tree of MemoryRegions has been built, it is analysed to produce a data structure called a FlatView. We identify (using the rules above) what leaf MemoryRegion would be hit for each part of the address space, and create the FlatView, which is basically a list of ranges, so it might say "for addresses 0 to 0x8000, MemoryRegion 1; for addresses 0x8000 to 0x10000, MemoryRegion 2", and so on. (The details are a little more complicated.) Once the FlatView has been created, memory accesses can be done quickly because looking up the address in the FlatView to get the relevant MemoryRegion is fast. This code path gets used for memory accesses when using KVM, and for when devices do DMA to/from memory.

Secondly, when TCG emulation does a memory access for a guest address, on the first time around it has to take a slow path, but it will cache the resulting MemoryRegion or host RAM address in the QEMU TLB[*]. Then subsequent accesses to that page of the guest address space will be fast. In particular, for accesses to RAM which is backed by host RAM, the access is done entirely in code generated by the TCG JIT, and never has to come out to a C function.

Most of the code that creates and works with the FlatView is in softmmu/physmem.c and softmmu/memory.c. The code that works with the TLB is in accel/tcg/cputlb.c. The code that generates the inline sequences for the JIT fastpath is under tcg/.

[*] The QEMU TLB is similar in purpose to a hardware TLB, in that it speeds up lookups that start with a guest address, but it is not modelling the guest CPU's TLB. This question and answer have more details.

Are Qemu guest memory accesses trapped somewhere in the code?

Answers (1)

Related Questions