When is more granular locking better for performance?

Question

I recently took a stab at implementing a concurrent priority queue using a skip-list data structure - for the sake of this question, if you don't know what a skip-list is, I believe just picturing a linked list would be sufficient enough to answer. I attempted minimal locking (i.e. allowing multiple enqueues and dequeues simultaneously, only locking nodes or their forward pointers if necessary, releasing locks as soon as possible, traversing the list using Interlocked's, etc).

I was pleased with the results. However, writing a regular skiplist with the add and remove surrounded by a syncroot lock (i.e. only allowing one operation at any given time), was actually twice as fast.

I assumed that I must have bug in my implementation. However, even a 'Concurrent Priority Queue' listed on the Microsoft website, only actually allows for one operation at a time (i.e. a syncroot lock around enqueue and dequeue)

http://code.msdn.microsoft.com/Samples-for-Parallel-b4b76364/sourcecode?fileId=44488&pathId=1696822056

As a general rule, (and forgive me if this question is too general), at what point does more granular locking actually lead to a performance improvement? I imagine that in my case, since I actually have to traverse large lists with Interlocked.Exchange (is there a better way?) as well as multiple test and test and set's, etc, that this slows down the enqueues and dequeues.

Additionally, is there a tool that could help me determine where the majority of the time is being spent? Thanks.

Servy · Accepted Answer

There is an overhead of constantly checking if a locked region is currently free, setting the lock, and then releasing it when leaving. If, in practice, you very rarely actually compete with anyone else for access to that critical section you have just performed that overhead for no beneficial reason. If, on the other hand, there are lots and lots of threads trying to use your data structure then you may find that the increased performance from running actions in parallel (if you have a many cored CPU) exceeds the overhead of managing the locks.

With these types of data structures it's a rather uncommon case to have several dozen threads all trying to use the data structure at the same time, while performing operations that wouldn't usually require them to wait for each other. So, in short, you could probably generate test cases that resulted in your lesser lock cases performing better by adding threads, and properly managing what they are all doing. If your current tests are a reasonable representation of how you actually plan to use the data structure then clearly your uses won't benefit from a more elaborate locking scheme.

When is more granular locking better for performance?

Answers (2)

Related Questions