Thread Safety With Parallel Operations

Before I start, I should mention that I feel like I've got the wrong end of the stick here. But here we go anyway:

Imagine we have the following class:

public class SomeObject {
    public int SomeInt;
    private SomeObject anotherObject;

    public void DoStuff() {
        if (SomeCondition()) anotherObject.SomeInt += 1;
    }
}

Now, imagine that we have a collection of these SomeObjects:

IList<SomeObject> allObjects = new List<SomeObject>(1000);
// ... Pretend the list is populated with 1000 SomeObjects here

Let's say I call DoStuff() on each one, like so:

foreach (var @object in allObjects) @object.DoStuff();

All is good so far. Now, let's assume that the order in which the objects have their DoStuff() called is not important. Assume that SomeCondition() is computationally expensive, perhaps. I could utilize all four cores on my machine (and potentially get a performance gain) with:

Parallel.For(0, 1000, i => allObjects[i].DoStuff());

Now, ignoring any issues with atomicity of variable access, I don't care whilst I am in the loop whether or not any given SomeObject sees an outdated version of anotherObject or SomeInt.* However, once the loop is done, I want to make sure that my main worker thread (i.e. the one that called Parallel.For) DOES see everything up-to-date.

Is there a guarantee of this (e.g. some sort of memory barrier?) with using Parallel.For? Or do I need to make some sort of guarantee myself? Or is there no way to make this guarantee?

Finally, if I call Parallel.For(...) again in the same way just after, will all worker threads be working with the new, up-to-date values for everything?

(*) The implementers of DoStuff() would be wrong to make assumptions about the order of processing anyway, right?

Upvotes: 2

Answers (2)

Ade Miller

Reputation: 13723

There are two issues here.

However, once the loop is done, I want to make sure that my main worker thread (i.e. the one that called Parallel.For) DOES see everything up-to-date.

To answer your question. Yes, once your Parallel.For has completed all the calls to DoStuff will have completed and your array will not see any more updates.

Now, ignoring any issues with atomicity of variable access, I don't care whilst I am in the loop whether or not any given SomeObject sees an outdated version of anotherObject or SomeInt.*

I really doubt that you don't care about this if you want a correct answer. Bassam's answer addresses the potential data races in your code. If one thread is running DoSomething and this writes to another index in the array which is simultaneously being read by another thread then you will see nondeterministic results. Locking can solve this (as shown above) but at the expense of performance. Locking on every thread for every update effectively serializes your work. I suspect that Bassam's lock example actually runs no faster and possibly slower that the non-locking one, although it does produce the correct answer.

If SomeObject::anotherObject refers to anything other than this you have a potential race condition. Consider the case where anotherObject refers to the element in the array adjacent to the current object. What happens when these run concurrently? One thread's code will be trying to read an instance of SomeObject while another thread writes to it. The write not guaranteed to happen atomically, your read my return an object in a half written state.

This depends a bit on what is being updated in SomeObject and how it's being updated. For example if all you are doing is incrementing an single integer value you could use Interlocked Operations to increment the value in a thread safe way or use critical sections or locks to ensure that your SomeObject is actually thread safe. Adding synchronization operations usually impacts performance so if possible I would recommend looking for an approach that does not require adding synchronization.

You can fix this in one of two ways.

1) If each instance of anotherObject in the array is guaranteed to be only updated once by one call to allObjects[i].DoStuff() then you can modify your code to have an input and output array. This prevents any race conditions as reads and writes no longer conflict. It means you need two copies of your array and they both need to be initialized.

2) If you are updating array items multiple times, or having two arrays of SomeObject is not an option and SomeCondition() is the only computationally expensive part of your method then you could parallelize this and then update the array sequentially.

IList<bool> allConditions = new List<bool>(1000);
Parallel.For(0, 1000, i => SomeCondition(i)) // Write allConditions not allObjects
for (int i = 0; i < 1000; ++i) { @object.DoStuff(allConditions[i]); }

So your observation:

This is interesting. It means that Parallel.For is basically only useful for code that's already thread-safe... Damn

Is not entirely correct. The code within your Parallel.For must either be thread safe or not access data and resources in a non-thread safe way. In other words it doesn't have to lock if you can rearrange your code to guarantee that there are no race conditions (or deadlocks) because none of the threads write the same data or will read data that another thread may be writing to. Note that concurrent reads are OK.

Upvotes: 1

Bassam Alugili

Reputation: 17003

        var locker = new object();
        var total = 0.0;
        Parallel.For(1, 10000000,
        i => { lock (locker) total += (i + 1); });
        Console.WriteLine("WithLocker" + total);

        var total2 = 0.0;
        Parallel.For(1, 10000000,
        i => total2 += (i + 1));
        Console.WriteLine("WithoutLocker" + total2);
        Console.ReadKey();

        // WithLocker 50000004999999
        // WithoutLocker 28861729333278

I have made for you two example one with locker and one without look to the result!

Upvotes: 1

Thread Safety With Parallel Operations

Answers (2)

Related Questions