zephyr0110
zephyr0110

Reputation: 225

race condition in KVM with hypercall KVM_HC_KICK_CPU

To implement efficient spinlocks in the VM enviroment, KVM documentation says that a vcpu waiting for spinlock can execute halt instruction and let the spinlock holder vcpu get chance for execution, this spinlock holder vcpu can then execute KVM_HC_KICK_CPU hypercall and awake the waiting vcpu.

Now here is my question:

Imagine below sequence of instructions

CHECK_SPIN_LOCK_FLAG
// <------------ waiting vCPU get scheduled out at exactly before executing hlt
hlt

now, when the spinlock holder vcpu wakes up, releases the spinlock and then tries to wake the cpu, there is nothing to do as cpu is already running. However, when the spinlock waiting cpu get scheduled, it will execute hlt instruction and remain there.

is this a race condition in this hypercall design?

The following is excerpt from hypercall.rst in the Documentation/virt/kvm/x86/hypercalls.rst

5. KVM_HC_KICK_CPU
------------------

:Architecture: x86
:Status: active
:Purpose: Hypercall used to wakeup a vcpu from HLT state
:Usage example:
  A vcpu of a paravirtualized guest that is busywaiting in guest
  kernel mode for an event to occur (ex: a spinlock to become available) can
  execute HLT instruction once it has busy-waited for more than a threshold
  time-interval. Execution of HLT instruction would cause the hypervisor to put
  the vcpu to sleep until occurrence of an appropriate event. Another vcpu of the
  same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
  specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
  is used in the hypercall for future use.

Upvotes: 0

Views: 113

Answers (1)

zephyr0110
zephyr0110

Reputation: 225

okay, I am not sure whether I solved the problem or not but it seems there is one more hypercall

7. KVM_HC_SCHED_YIELD
---------------------

:Architecture: x86
:Status: active
:Purpose: Hypercall used to yield if the IPI target vCPU is preempted

a0: destination APIC ID

:Usage example: When sending a call-function IPI-many to vCPUs, yield if
                any of the IPI target vCPUs was preempted.

We can use above hypercall before waking up the cpu, to make sure the target vcpu was indeed in halted state. If it was preempted, then yield and resume when target is halted rather than preempted.

Solution as of now:

struct mutex_t {
    uint64_t intent_cpus;
    uint64_t waiting_cpus;
    uint64_t m;
}

get_lock () {
    // rcx has the cpu id, rdx has the address of mutex
    mov $1, %rax
    xorq %rbx, %rbx
    // set intent for doing lock
    lock bts %rcx, (%rdx)
    // rdx has mutex address, modifies ZF
    lock cmpxchg %rbx, 16(%rdx)
    jnz skip_hlt
    // Will be waiting for the kick
    lock bts %rcx, 8(%rdx)
    hlt
skip_hlt:
    // reset intent for lock
    lock btr  (%rdx), %rcx // modifies CF
}

release_lock() { // rdx has mutex address
    xor %rax, %rax
    lock xchgq %rax, 16(%rdx)
    bsf (%rdx), %rcx // rcx has the least significant set bit
    jz no_cpus_with_intent
check_for_intent:
    bt %rcx, (%rdx)
    jnc cpu_does_not_have_intent
    bt %rcx, 8(%rdx)
    jc yield_once_and_return
    /* it has intent, but no waiting bit,
       yield so that it can either halt or get lock */
    yield_to_vcpu(%rcx)
    j check_for_intent
    yield_once_and_return:
    /* here either it is halted or preempted
       just before executing hlt */
    yield_to_vcpu(%rcx)
    kick_the_vcpu(%rcx)
cpu_does_not_have_intent:
no_cpus_with_intent:

}

Upvotes: 0

Related Questions