Effect of lock keyword on a Parallel.ForEach loop

This is more a conceptual question. I was wondering if I used a lock inside of Parallel.ForEach loop if that would take away the benefits of parallelizing a foreach loop.

Here is some sample code where I have seen it done.

Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
    lock (fileLockObject)
    {
        if (fileLocks.ContainsKey(reference.Key) == false)
        {
            fileLocks.Add(reference.Key, new object());
        }
    }

    RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount,
        ref recursionCount);

    lock (fileLocks[reference.Key])
    {
        reference.Value.Document.Save(reference.Key);
    }
});

Where fileLockObject and fileLocks are as follows.

private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();

Does this technique completely make the loop not parallel? I would like to see your thoughts on this.

Upvotes: 1

Answers (5)

Theodor Zoulias

Reputation: 43996

There is nothing conceptually wrong with using the lock statement in a Parallel.ForEach loop. If done correctly, the effect of the lock on the overall performance of the parallel operation can be totally negligible. An uncontended lock is really cheap. It can be acquired and released in as little as 20 nanoseconds on a 2010-era computer, in other words in can be acquired and released 50,000,000 times per second (citation). On the other hand if the lock is used incorrectly, it can completely serialize the body delegate, negating all the benefits of parallelization. Or even worse, as in the code snippet presented in your question, it can be applied incorrectly and allow concurrent access on non-thread-safe state (fileLocks), resulting in undefined behavior.

The general advice is: do the least amount of work while holding the lock, and release it at the earliest opportunity. Don't do work inside the lock that can be done outside the lock.

Regarding the specific problem that the code in the question is attempting to solve, which is to prevent parallel processing for reference objects having the same Key, locking on the Key might not be the optimal strategy. That's because it blocks ThreadPool threads, and also because it may result in reduced degree of parallelism. Your code does not specify the MaxDegreeOfParallelism, implying unlimited parallelism, which in practice means that the effective degree of parallelism will be determined by the ThreadPool availability, which is a bit chaotic¹. In case you change your mind and decide to specify the MaxDegreeOfParallelism, and you also desire to maintain a constant degree of parallelism during the whole operation, the simple use of the lock won't cut it. You want the parallelization devoted to doing actual work, not occupied by waits for a lock. You could look at this answer for a more sophisticated approach on this specific problem. That answer is about the asynchronous Parallel.ForEachAsync, but adjusting it to the synchronous Parallel.ForEach should be straightforward.

¹ _{My personal advice is to specify always the MaxDegreeOfParallelism when using the Parallel APIs.}

Upvotes: 0

Henk Holterman

Reputation: 273844

if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.

Proportionally. When RecursiveBinUpdate() is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.

But worse, your code is not thread-safe. From your 2 operations on fileLocks, only the first is actually inside a lock.

lock (fileLockObject)
{
    if (fileLocks.ContainsKey(reference.Key) == false)
    {
       ...
    }
}

and

lock (fileLocks[reference.Key])   // this access to fileLocks[] is not protected

change the 2nd part to:

lock (fileLockObject)
{        
    reference.Value.Document.Save(reference.Key);
}

and the use of ref recursionCount as a parameter looks suspicious too. It might work with Interlocked.Increment though.

Upvotes: 3

Drew Marsh

Reputation: 33379

When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate).

Bottom line, I see nothing substantially wrong with what you're doing here.

Upvotes: 1

Servy

Reputation: 203847

It means all of the work inside of the lock can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.

That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary. You can instead use a ConcurrentDictionary, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.

Upvotes: 3

Reed Copsey

Reputation: 564911

The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.

Upvotes: 0

Effect of lock keyword on a Parallel.ForEach loop

Answers (5)

Related Questions