I've suggested to the maintainers of the D programming language runtime a few times that the memory allocator/garbage collector should use spinlocks instead of regular OS critical sections. This hasn't really caught on. Here are the reasons I think spinlocks would be better: At least in synthetic benchmarks that I did, it's several times faster than OS critical sections when there's contention for the memory allocator/GC lock. Edit: Empirically, using spinlocks didn't even have measurable overhead in a single-core environment, probably because locks need to be held for such a short period of time in a memory allocator. Memory allocations and similar operations usually take a small fraction of a timeslice, and even a small fraction of the time a context switch takes, making it silly to context switch in the case of contention. A garbage collection in the implementation in question stops the world anyhow. There won't be any spinning during a collection. Are there any good reasons not to use spinlocks in a memory allocator/garbage collector implementation?

memory-managementconcurrencygarbage-collectionmutexd

Reputation: 68750

Are spinlocks a good choice for a memory allocator?

I've suggested to the maintainers of the D programming language runtime a few times that the memory allocator/garbage collector should use spinlocks instead of regular OS critical sections. This hasn't really caught on. Here are the reasons I think spinlocks would be better:

At least in synthetic benchmarks that I did, it's several times faster than OS critical sections when there's contention for the memory allocator/GC lock. Edit: Empirically, using spinlocks didn't even have measurable overhead in a single-core environment, probably because locks need to be held for such a short period of time in a memory allocator.
Memory allocations and similar operations usually take a small fraction of a timeslice, and even a small fraction of the time a context switch takes, making it silly to context switch in the case of contention.
A garbage collection in the implementation in question stops the world anyhow. There won't be any spinning during a collection.

Are there any good reasons not to use spinlocks in a memory allocator/garbage collector implementation?

Upvotes: 8

Answers (6)

J D

Reputation: 48687

One of the perf bugs in the Glasgow Haskell Compiler's garbage collector is so annoying that it has a name, the "last core slowdown". This is a direct consequence of their inappropriate use of spinlocks in their GC and is excacerbated on Linux due to its scheduler but, in fact, the effect can be observed whenever other programs are competing for CPU time.

The effect is clear on the second graph here and can be seen affecting more than just the last core here, where the Haskell program sees performance degradation beyond only 5 cores.

Upvotes: 1

ben

Reputation: 223

Not sure if i agree, as memory allocations CAN take a very long time ( the only way it doesnt if you preallocate all the memory and then re issue it) ..You really need to try the same allocations and deallocations with multi gig heap sizes with millions of entries , with many applications hit the allocation critical section ( note applications and not threads) and with disk trashing/swapping from not enough memory. You also can get disk swapping issues durring allocation and doing a spin lock waiting for a disk request is certainly not appropriate.

And as CyberShadow mentioned on single threaded CPU you end up going to a normal lock with an overhead. Now a language may run on many embedded CPUS which are all single threaded.

Also if you can get away with an interlocked exchange , that is best ( as it is lockless though still stalls the CPU and raises LOCK# for multi core memory) but most locks use this anyway ( but need to do more) . However the structure of a heap normally means an interlocked exchance is not enough and you end up creating a critical section. Note it is possible in a ( Generational) Mark Sweep Nursery with a GC to do allocations as a interlocked compare and add of the pointer. I do this for the Cosmos C# OS GC and it makes for stack speed allocations.

Upvotes: 0

Jason Orendorff

Reputation: 45116

Obviously the worst-case behavior of a spinlock is awful (the OS scheduler just sees 30 CPU-bound threads, so it tries to give them all some CPU time; 29 of them spin like mad while the thread that holds the lock sleeps), so you should avoid them if you can. Plenty of people smarter than me claim spinlocks have no userspace use cases because of this.
System mutexes should spin a little before putting the thread to sleep (or indeed making any kind of system call), so they can sometimes perform exactly the same as spinlocks even when there's some contention.
An allocator can often practically eliminate lock contention by using the lock only to allocate pages to threads. Each thread is then responsible for partitioning its own pages. You end up acquiring the lock only once every N allocations, and you can configure N to whatever you like.

I consider 2 and 3 to be strong arguments that can't be effectively countered by synthetic benchmarks. You would need to show that the performance of a real-world program suffers.

Upvotes: 3

ChrisW

Reputation: 56113

Are there any good reasons not to use spinlocks in a memory allocator/garbage collector implementation?

When some threads are compute-bound (CPU-bound) and other threads are memory-allocator-bound, then using spinlocks takes CPU cycles which could otherwise be used either by the compute-bound threads and/or used by threads which belong to other processes.

Upvotes: 2

Michael Burr

Reputation: 340258

On Windows anyway, critical section objects already have the option of doing this (http://msdn.microsoft.com/en-us/library/ms682530.aspx):

A thread uses the InitializeCriticalSectionAndSpinCount or SetCriticalSectionSpinCount function to specify a spin count for the critical section object. Spinning means that when a thread tries to acquire a critical section that is locked, the thread enters a loop, checks to see if the lock is released, and if the lock is not released, the thread goes to sleep. On single-processor systems, the spin count is ignored and the critical section spin count is set to 0 (zero). On multiprocessor systems, if the critical section is unavailable, the calling thread spins dwSpinCount times before performing a wait operation on a semaphore that is associated with the critical section. If the critical section becomes free during the spin operation, the calling thread avoids the wait operation.

Hopefully other platforms will follow suit if they don't already.

Upvotes: 2

Vladimir Panteleev

Reputation: 25187

Spinlocks are absolutely worthless on systems with only one CPU/core, or - more generally - in high-contention situtations (when you have many threads waiting on the lock).

Upvotes: 2

Are spinlocks a good choice for a memory allocator?

Answers (6)

Related Questions