Reputation:
The citation comes from http://preshing.com/20111124/always-use-a-lightweight-mutex/
The Windows Critical Section is what we call a lightweight mutex. It’s optimized for the case when there are no other threads competing for the lock. To demonstrate using a simple example, here’s a single thread which locks and unlocks a Windows Mutex exactly one million times
Does it mean that lightweight mutex is just a smart heavy (kernel) mutex? By "smart" I mean that only when mutex is free it doesn't make a syscall?
Upvotes: 2
Views: 2446
Reputation: 244843
In summary, yes: on Windows, critical sections and mutexes are similar, but critical sections are lighter weight because they avoid a system call when there is no contention.
Windows has two different mutual-exclusion primitives: critical sections and mutexes. They serve similar functions, but critical sections are significantly faster than mutexes.
Mutexes always result in a system call down to the kernel, which requires a processor ring-mode switch and entails a significant amount of overhead. (The user-mode thread raises an exception, which is then caught by the kernel thread running in ring 0; the user-mode thread remains halted until execution returns back out of kernel mode.) Although they are slower, mutexes are much more powerful and flexible. They can be shared across processes, a waiting thread can specify a time-out period, and a waiting thread can also determine whether the thread that owned the mutex terminated or if the mutex was deleted.
Critical sections are much lighter-weight objects, and therefore much faster than mutexes. In the most common case of uncontended acquires, critical sections are incredibly fast because they just atomically increment a value in user-mode and return immediately. (Internally, the InterlockedCompareExchange
API is used to "acquire" the critical section.)
Critical sections only switch to kernel mode when there is contention over the acquisition. In such cases, the critical section actually allocates a semaphore internally, storing it in a dedicated field in the critical section's structure (which is originally unallocated). So basically, in cases of contention, you see performance degrade to that of a mutex because you effectively are using a mutex. The user-mode thread is suspended and kernel-mode is entered to wait on either the semaphore or an event.
Critical sections in Windows are somewhat akin to "futexes" in Linux. A futex is a "Fast User-space muTEX" that, like a critical section, only switches to kernel-mode when arbitration is required.
The performance benefit of a critical section comes with serious caveats, including the inability specify a wait time-out period, the inability of a thread to determine if the owning thread was terminated before it released the critical section, the inability to determine if the critical section was deleted, and the inability to use critical sections across processes (critical sections are process-local objects).
As such, you should keep the following guidelines in mind when deciding between critical sections and mutexes:
You'll find lots of benchmarks online showing the relative performance difference between critical sections and mutexes, including in the article you link, which says critical sections are 25 times faster than mutexes. I have a comment here in my class library from an article I read a long time ago that says, "On a Pentium II 300 MHz, the round-trip for a critical section (assuming no contention, so no context switching required) takes 0.29 µs. With a mutex, it takes 5.3 µs." The consensus seems to be somewhere between 15–30% faster when you can avoid a kernel-mode transition. I didn't bother to benchmark it myself. :-)
Further reading:
Critical Section Objects on MSDN:
A critical section object provides synchronization similar to that provided by a mutex object, except that a critical section can be used only by the threads of a single process. Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. Unlike a mutex object, there is no way to tell whether a critical section has been abandoned.
[ … ]
A thread uses theEnterCriticalSection
orTryEnterCriticalSection
function to request ownership of a critical section. It uses theLeaveCriticalSection
function to release ownership of a critical section. If the critical section object is currently owned by another thread,EnterCriticalSection
waits indefinitely for ownership. In contrast, when a mutex object is used for mutual exclusion, the wait functions accept a specified time-out interval.
INFO: Critical Sections Versus Mutexes, also on MSDN:
Critical sections and mutexes provide synchronization that is very similar, except that critical sections can be used only by the threads of a single process. There are two areas to consider when choosing which method to use within a single process:
Speed. The Synchronization overview says the following about critical sections:
... critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization. Critical sections use a processor-specific test and set instruction to determine mutual exclusion.
Deadlock. The Synchronization overview says the following about mutexes:
If a thread terminates without releasing its ownership of a mutex object, the mutex is considered to be abandoned. A waiting thread can acquire ownership of an abandoned mutex, but the wait function's return value indicates that the mutex is abandoned.
WaitForSingleObject()
will returnWAIT_ABANDONED
for a mutex that has been abandoned. However, the resource that the mutex is protecting is left in an unknown state.There is no way to tell whether a critical section has been abandoned.
The article you link to in the question also links to this post on Larry Osterman's blog, which gives some more interesting details about the implementation.
Upvotes: 2