Sylares
Sylares

Reputation: 21

In full virtualization context, what happens on guest OS system calls?

Getting confused about protection rings, especially on the context of virtualization, can someone help to demystify the following:

  1. Is my following understanding correct: protection rings (or similar concepts for other non x86 chips) are enforced on the hardware (circut?) level regardless of the instructions in that, if an operation of the instruction requires higher privilege than the current CPU mode, it triggers an interruption.
  2. If the system call set the privilege level to ring0 for the codes under interruption handlers to run, what happens under full virtualization context? Does the system call from the guest OS actually set the privilege level to ring1 and the actual implementation is made by further interruptions? If so how it is achieved, if not how the VMM ensure the kernel of guest OS is running on a less priviledged mode?

Thanks in advance!

Upvotes: 1

Views: 359

Answers (1)

Margaret Bloom
Margaret Bloom

Reputation: 44068

Point 1 is correct.

Point 2 works by introducing a new privilege level somewhat higher than ring 0. In the infosec community (where architecture knowledge is approximative) this new level is called "ring -1". For the x86 case, it is called VMX non-root mode. This mode is entered every time the VMM launch a VM (with the vmlaunch instruction for the Intel CPUs).

When in VMX non-root mode the CPU generates a VM exit every time an instruction that could change the processor configuration is executed.
These are called sensitive instruction in the framework of Popek and Goldberg.

Note that sensitive instructions are not the privileged ones, the main goal of the VMM is to virtualize the environment not to protect the guest kernel from the guest userspace. So an instruction like sgdt, which is legal in user space (ring 3), causes a VM exit in VMX non-root mode because the VMM has to fake the guest GDT. Analogously an in instruction, which is legal in kernel space (ring 0), causes a VM exit in VMX non-root mode because the VMM has to fake the hardware.

The rule of thumb is that the "exits" should be as few as possible but enough to prevent the guest from not working correctly or corrupting the host.

So, you see that this new mode, despite being called "ring -1", doesn't really fit with the ring levels.
Instead is more orthogonal and based upon the division: VMX non-root mode (the VM) and VMX root mode (the VMM, which is a kernel space program and lives in the host ring levels).
The VMX non-root mode "generates" its own ring levels in a sense: instructions are executed directly by the CPU (no matter how nested the VM is) like any other program in the host and thus are subject to the ring hierarchy (as any other instruction) but the VMM controls which instructions can cause a VM exit and can thus simulate/fake/virtualize them (protecting itself, the hardware and the host OS).
This allows the VMM to instruct the CPU to reset the ring protection mechanism as wanted by the guest OS every time the guest runs (and then switch it back).

The nesting paging is another protection mechanism (set up by the VMM) which is worth knowing.

Upvotes: 2

Related Questions