Reputation: 68750
I've suggested to the maintainers of the D programming language runtime a few times that the memory allocator/garbage collector should use spinlocks instead of regular OS critical sections. This hasn't really caught on. Here are the reasons I think spinlocks would be better:
Are there any good reasons not to use spinlocks in a memory allocator/garbage collector implementation?
Upvotes: 8
Views: 1650
Reputation: 48687
One of the perf bugs in the Glasgow Haskell Compiler's garbage collector is so annoying that it has a name, the "last core slowdown". This is a direct consequence of their inappropriate use of spinlocks in their GC and is excacerbated on Linux due to its scheduler but, in fact, the effect can be observed whenever other programs are competing for CPU time.
The effect is clear on the second graph here and can be seen affecting more than just the last core here, where the Haskell program sees performance degradation beyond only 5 cores.
Upvotes: 1
Reputation: 223
Not sure if i agree, as memory allocations CAN take a very long time ( the only way it doesnt if you preallocate all the memory and then re issue it) ..You really need to try the same allocations and deallocations with multi gig heap sizes with millions of entries , with many applications hit the allocation critical section ( note applications and not threads) and with disk trashing/swapping from not enough memory. You also can get disk swapping issues durring allocation and doing a spin lock waiting for a disk request is certainly not appropriate.
And as CyberShadow mentioned on single threaded CPU you end up going to a normal lock with an overhead. Now a language may run on many embedded CPUS which are all single threaded.
Also if you can get away with an interlocked exchange , that is best ( as it is lockless though still stalls the CPU and raises LOCK# for multi core memory) but most locks use this anyway ( but need to do more) . However the structure of a heap normally means an interlocked exchance is not enough and you end up creating a critical section. Note it is possible in a ( Generational) Mark Sweep Nursery with a GC to do allocations as a interlocked compare and add of the pointer. I do this for the Cosmos C# OS GC and it makes for stack speed allocations.
Upvotes: 0
Reputation: 45116
Obviously the worst-case behavior of a spinlock is awful (the OS scheduler just sees 30 CPU-bound threads, so it tries to give them all some CPU time; 29 of them spin like mad while the thread that holds the lock sleeps), so you should avoid them if you can. Plenty of people smarter than me claim spinlocks have no userspace use cases because of this.
System mutexes should spin a little before putting the thread to sleep (or indeed making any kind of system call), so they can sometimes perform exactly the same as spinlocks even when there's some contention.
An allocator can often practically eliminate lock contention by using the lock only to allocate pages to threads. Each thread is then responsible for partitioning its own pages. You end up acquiring the lock only once every N allocations, and you can configure N to whatever you like.
I consider 2 and 3 to be strong arguments that can't be effectively countered by synthetic benchmarks. You would need to show that the performance of a real-world program suffers.
Upvotes: 3
Reputation: 56113
Are there any good reasons not to use spinlocks in a memory allocator/garbage collector implementation?
When some threads are compute-bound (CPU-bound) and other threads are memory-allocator-bound, then using spinlocks takes CPU cycles which could otherwise be used either by the compute-bound threads and/or used by threads which belong to other processes.
Upvotes: 2
Reputation: 340258
On Windows anyway, critical section objects already have the option of doing this (http://msdn.microsoft.com/en-us/library/ms682530.aspx):
A thread uses the InitializeCriticalSectionAndSpinCount or SetCriticalSectionSpinCount function to specify a spin count for the critical section object. Spinning means that when a thread tries to acquire a critical section that is locked, the thread enters a loop, checks to see if the lock is released, and if the lock is not released, the thread goes to sleep. On single-processor systems, the spin count is ignored and the critical section spin count is set to 0 (zero). On multiprocessor systems, if the critical section is unavailable, the calling thread spins dwSpinCount times before performing a wait operation on a semaphore that is associated with the critical section. If the critical section becomes free during the spin operation, the calling thread avoids the wait operation.
Hopefully other platforms will follow suit if they don't already.
Upvotes: 2
Reputation: 25187
Spinlocks are absolutely worthless on systems with only one CPU/core, or - more generally - in high-contention situtations (when you have many threads waiting on the lock).
Upvotes: 2