aky
aky

Reputation: 83

implementing spinlock functionality in xv6 to be able to use APIs from user level

xv6 has spinlock.c file for creating spinlock for kernel usage. But I need to implement spinlock APIs to be used at user level. For example I will implement, sp_create() to create a spinlock at user level. Or call sp_acquire(int id) to get the lock, etc. To do that I am supposed to create system calls and place the actual implementation in kernel. xv6 has spinlock functionality but only to be used at kernel level.

I thought about creating system calls which are actually calling the corresponding functions in spinlock.c to create the lock, acquire it, release it, etc. But it does not work due to some issues with interrupt disabling.

I am copying below the code I wrote so far:

//system call for lock_take():
int  l_take(int lockid) {

  struct proc *curproc = myproc();

  //process will take lock
  ..
    acquire(&LL.arraylockList[lockid].spnLock);  
  ..
  return 0; 
}

Problem I am hitting here is that it gives me error about panic: sched locks I think it is because acquire() code has pushcli() in it.

void
acquire(struct spinlock *lk)
{
    pushcli(); // disable interrupts to avoid deadlock.
    if (holding(lk)) panic("acquire");

    // The xchg is atomic.
    while (xchg(&lk->locked, 1) != 0)
        ;

    // Tell the C compiler and the processor to not move loads or stores
    // past this point, to ensure that the critical section's memory
    // references happen after the lock is acquired.
    __sync_synchronize();

    // Record info about lock acquisition for debugging.
    lk->cpu = mycpu();
    getcallerpcs(&lk, lk->pcs);
}

Then I copied the code to a new function acquire2() and used it in my system call, where pushcli() is commented out:

acquire2(struct spinlock *lk)
{
        //    pushcli(); // disable interrupts to avoid deadlock.
    if (holding(lk)) panic("acquire");

    // The xchg is atomic.
    while (xchg(&lk->locked, 1) != 0) {
      ;
    }

    // Tell the C compiler and the processor to not move loads or stores
    // past this point, to ensure that the critical section's memory
    // references happen after the lock is acquired.
    __sync_synchronize();

    // Record info about lock acquisition for debugging.
    lk->cpu = mycpu();
    getcallerpcs(&lk, lk->pcs);
}

However, then the error message changes to this: panic: mycpu() called with interrupts enabled

It turns out to be disabling interrupts are not allowed. So, pushcli() and popcli() should not be used. Then I need to figure out how to run mycpu() atomically. Its implementation is like this:

// Must be called with interrupts disabled to avoid the caller being rescheduled
// between reading lapicid and running through the loop.
struct cpu *
mycpu(void)
{
    int apicid, i;

    if (readeflags() & FL_IF) panic("mycpu called with interrupts enabled\n");

    apicid = lapicid();
    // APIC IDs are not guaranteed to be contiguous. Maybe we should have
    // a reverse map, or reserve a register to store &cpus[i].
    for (i = 0; i < ncpu; ++i) {
        if (cpus[i].apicid == apicid) return &cpus[i];
    }
    panic("unknown apicid\n");
}

The for loop and the line above it need to execute atomically. How do I do it?

Upvotes: 1

Views: 3198

Answers (1)

Brendan
Brendan

Reputation: 37222

Working backwards (starting at the end of your questions)...

How do I do it?

If you must do it; you'd start by finding an alternative way that doesn't involve a loop.

There are probably many possible alternatives (assuming 80x86; using a "CPU local structure found via. GS or FS somehow, using a different TSS per CPU and using that to find CPU number, using the RDTSCP instruction after making sure it's supported and configured to suit, stealing/repurposing a register like DR3, using paging to create a "CPU specific zone", ...).

The for loop and the line above it need to execute atomically.

Why? Assuming that the table is generated during boot and never changed after that (and that there's no hot-plug CPU support); or assuming that the table only changes in a way that ensures it's correct when a CPU needs to find itself (recognizing that a CPU that has been taken offline can't try to find itself); there's no reason (that I can see) that the loop needs to be atomic.

It turns out to be disabling interrupts are not allowed.

Note: This isn't right and should be "It turns out that interrupts MUST be disabled".

That's because the OS is using "disable IRQs" to disable/postpone task switches. This isn't a good or necessary requirement (a kernel can have its own flag/variable somewhere else to disable/postpone task switches without disabling IRQs; and 2 different types of spinlocks where one is used by code that may be executed by IRQ handlers that disables both and one that is never used by code that may be executed by IRQ handlers that only disables/postpones task and never disables IRQs). Of course this would require some major changes to the kernel.

Note that for the code you've posted it's extremely likely that acquiring a spinlock causes all IRQs to be disabled, and then releasing a spinlock causes IRQs to be re-enabled again. This means (if you let user-space use the kernel's code via. a syscall) user-space can acquire any spinlock, then hog 100% of CPU time forever (by never releasing the spinlock). In other words, what you're planning to do is going to create a massive "denial of service" security vulnerability.

I thought about creating system calls which are actually calling the corresponding functions in spinlock.c to create the lock, acquire it, release it, etc.

This is the cause of all your problems.

Instead of creating a system call, just create some spinlock code for user-space. You can do that by copying (with "copy and paste") the kernel's functions directly into your user-space code (or maybe a library) and deleting all the irrelevant stuff to end up with something like:

my_acquire(int *lk)
{
    // The xchg is atomic.
    while (xchg(lk, 1) != 0) {
      ;
    }

    // Tell the C compiler and the processor to not move loads or stores
    // past this point, to ensure that the critical section's memory
    // references happen after the lock is acquired.
    __sync_synchronize();
}

Note: For that to work you'll also need to copy the code for xchg() and __sync_synchronize() to user-space; but that shouldn't be a problem.

However; I should point out that the code in the kernel is not good. Typically you want to use a "test; then atomically test and set" approach so that (under contention) you're not doing anything atomic (and are just doing testing in a loop); and (for 80x86) you want to use a pause instruction in the loop to improve "loop exit speed" and improve the speed for the other logical CPU in the core (for hyper-threading); and (for kernel) you don't want to disable IRQs when testing and only want to disable IRQs for the "atomic test and set" to avoid damaging "IRQ latency" for no reason.

Implementing spinlock at userlevel

This is also likely to be a bad idea in general.

The problem is that (if it's not a massive security disaster) a task switch can occur while you're holding the lock; causing all other tasks (e.g. potentially many CPUs) to waste their entire time slices spinning without any hope of acquiring the lock (because the lock is already acquired by a task that is not running and won't run because tasks are busy wasting CPU time spinning for nothing instead).

The solution to this problem (that isn't a massive security disaster) is mutexes/semaphores. Specifically, the solution is to be able to tell the kernel's scheduler "don't give this task CPU time until it can acquire the mutex/semaphore it needs". Of course this can be implemented such that the "happy case" (when there's no contention at all) is handled purely in user-space, and the kernel (and the overhead of a syscall, etc) is only involved if/when actually necessary.

Upvotes: 1

Related Questions