Reputation: 362
This is more a conceptual question. I was wondering if I used a lock
inside of Parallel.ForEach
loop if that would take away the benefits of parallelizing a foreach
loop.
Here is some sample code where I have seen it done.
Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
fileLocks.Add(reference.Key, new object());
}
}
RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount,
ref recursionCount);
lock (fileLocks[reference.Key])
{
reference.Value.Document.Save(reference.Key);
}
});
Where fileLockObject
and fileLocks
are as follows.
private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();
Does this technique completely make the loop not parallel? I would like to see your thoughts on this.
Upvotes: 1
Views: 2200
Reputation: 43996
There is nothing conceptually wrong with using the lock
statement in a Parallel.ForEach
loop. If done correctly, the effect of the lock
on the overall performance of the parallel operation can be totally negligible. An uncontended lock
is really cheap. It can be acquired and released in as little as 20 nanoseconds on a 2010-era computer, in other words in can be acquired and released 50,000,000 times per second (citation). On the other hand if the lock
is used incorrectly, it can completely serialize the body
delegate, negating all the benefits of parallelization. Or even worse, as in the code snippet presented in your question, it can be applied incorrectly and allow concurrent access on non-thread-safe state (fileLocks
), resulting in undefined behavior.
The general advice is: do the least amount of work while holding the lock
, and release it at the earliest opportunity. Don't do work inside the lock
that can be done outside the lock
.
Regarding the specific problem that the code in the question is attempting to solve, which is to prevent parallel processing for reference
objects having the same Key
, locking on the Key
might not be the optimal strategy. That's because it blocks ThreadPool
threads, and also because it may result in reduced degree of parallelism. Your code does not specify the MaxDegreeOfParallelism
, implying unlimited parallelism, which in practice means that the effective degree of parallelism will be determined by the ThreadPool
availability, which is a bit chaotic¹. In case you change your mind and decide to specify the MaxDegreeOfParallelism
, and you also desire to maintain a constant degree of parallelism during the whole operation, the simple use of the lock
won't cut it. You want the parallelization devoted to doing actual work, not occupied by waits for a lock. You could look at this answer for a more sophisticated approach on this specific problem. That answer is about the asynchronous Parallel.ForEachAsync
, but adjusting it to the synchronous Parallel.ForEach
should be straightforward.
¹ My personal advice is to specify always the MaxDegreeOfParallelism
when using the Parallel
APIs.
Upvotes: 0
Reputation: 273844
if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.
Proportionally. When RecursiveBinUpdate()
is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.
But worse, your code is not thread-safe. From your 2 operations on fileLocks
, only the first is actually inside a lock.
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
...
}
}
and
lock (fileLocks[reference.Key]) // this access to fileLocks[] is not protected
change the 2nd part to:
lock (fileLockObject)
{
reference.Value.Document.Save(reference.Key);
}
and the use of ref recursionCount
as a parameter looks suspicious too. It might work with Interlocked.Increment though.
Upvotes: 3
Reputation: 33379
When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate
).
Bottom line, I see nothing substantially wrong with what you're doing here.
Upvotes: 1
Reputation: 203847
It means all of the work inside of the lock
can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.
That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary
. You can instead use a ConcurrentDictionary
, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.
Upvotes: 3
Reputation: 564911
The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate
function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.
Upvotes: 0