Reputation: 1073
I'm using parallel foreach/for loop, in particular case I need to go with nested parallel foreach/for loop. While I tried to print the values in my collection, sometimes console statements are not being printed which is not consistent. See the piece of code below.
Parallel.For(0, RunModuleConfigVariables.Count, new ParallelOptions { MaxDegreeOfParallelism = 3 }, index => {
string log = null;
int count = 0;
log += "Module Name " + RunModuleConfigVariables.Keys.ElementAt(index) + " thread: " + Thread.CurrentThread.ManagedThreadId + "\n";
Parallel.ForEach(RunModuleConfigVariables[RunModuleConfigVariables.Keys.ElementAt(index)], new ParallelOptions { MaxDegreeOfParallelism = 10 }, eachendpoint => {
log += "\t" + count + " Endpoint Name " + eachendpoint + "\n";
count++;
});
Console.WriteLine(log);
});
Collection:
Collection type is ConcurrentDictionary<string, HashSet>()
RunModuleConfigVariables:
{
"Module_1": [
"Module_1_Endpoint_1",
"Module_1_Endpoint_2",
"Module_1_Endpoint_3",
"Module_1_Endpoint_4",
"Module_1_Endpoint_5",
"Module_1_Endpoint_6",
"Module_1_Endpoint_7",
"Module_1_Endpoint_8",
"Module_1_Endpoint_9",
"Module_1_Endpoint_10",
"Module_1_Endpoint_11",
"Module_1_Endpoint_12",
"Module_1_Endpoint_13",
"Module_1_Endpoint_14",
"Module_1_Endpoint_15",
"Module_1_Endpoint_16",
"Module_1_Endpoint_17",
"Module_1_Endpoint_18",
"Module_1_Endpoint_19"
],
"Module_2": [
"Module_2_Endpoint_1",
"Module_2_Endpoint_2",
"Module_2_Endpoint_3"
],
"Module_3": [
"Module_3_Endpoint_1"
]
}
Actual Output:
Module Name Module_1 thread: 4
0 Endpoint Name Module_1_Endpoint_2
1 Endpoint Name Module_1_Endpoint_1
2 Endpoint Name Module_1_Endpoint_4
3 Endpoint Name Module_1_Endpoint_5
4 Endpoint Name Module_1_Endpoint_6
5 Endpoint Name Module_1_Endpoint_7
6 Endpoint Name Module_1_Endpoint_8
18 Endpoint Name Module_1_Endpoint_9
Module Name Module_3 thread: 5
0 Endpoint Name Module_3_Endpoint_1
Module Name Module_2 thread: 1
0 Endpoint Name Module_2_Endpoint_2
1 Endpoint Name Module_2_Endpoint_3
2 Endpoint Name Module_2_Endpoint_1
Expected Output: (Needn't be in same order)
Module Name Module_1 thread: 5
0 Endpoint Name Module_1_Endpoint_2
1 Endpoint Name Module_1_Endpoint_3
2 Endpoint Name Module_1_Endpoint_4
3 Endpoint Name Module_1_Endpoint_5
4 Endpoint Name Module_1_Endpoint_6
5 Endpoint Name Module_1_Endpoint_7
6 Endpoint Name Module_1_Endpoint_8
7 Endpoint Name Module_1_Endpoint_9
8 Endpoint Name Module_1_Endpoint_10
9 Endpoint Name Module_1_Endpoint_11
10 Endpoint Name Module_1_Endpoint_12
11 Endpoint Name Module_1_Endpoint_13
12 Endpoint Name Module_1_Endpoint_14
13 Endpoint Name Module_1_Endpoint_15
14 Endpoint Name Module_1_Endpoint_16
15 Endpoint Name Module_1_Endpoint_17
16 Endpoint Name Module_1_Endpoint_18
17 Endpoint Name Module_1_Endpoint_19
18 Endpoint Name Module_1_Endpoint_1
Module Name Module_2 thread: 4
0 Endpoint Name Module_2_Endpoint_2
1 Endpoint Name Module_2_Endpoint_3
2 Endpoint Name Module_2_Endpoint_1
Module Name Module_3 thread: 1
0 Endpoint Name Module_3_Endpoint_1
Note: Output is not consistent. Sometimes able to see all sub-childs and sometimes not. How can I understand this, and what can be done to overcome this?
Upvotes: 1
Views: 1955
Reputation: 156708
How can I understand this?
Parallel processing means multiple threads are doing things at the same time. This leads to all kinds of weird things that you have to be careful of.
Consider the line:
count++;
This one C# instruction actually represents multiple operations:
count
variable from memory into the processor.1
to the value of the value loaded into the processor.count
variable.Now imagine two threads doing these three instructions at the same time. There's a slight possibility that both of them will complete step 1 before either completes step 3. That means if count
started at zero, both threads will now set count
to 1
, which isn't what you intended.
This line has many more steps between the point where log
is read and the point where it is written:
log += "\t" + count + " Endpoint Name " + eachendpoint + "\n";
Therefore, you'll find that it's much more frequent for one thread to overwrite (rather than add to) the value already written by another thread. That's the behavior you're noticing.
... and let me know, what can be done to overcome this.
First, avoid parallel processing whenever possible.
If things are going fast enough with a simple foreach
loop, don't try to optimize them.
If things are not going fast enough with a simple foreach
loop, figure out why. Most of the time, it'll be because of I/O operations (disk or network accesses). In those cases, use concurrent execution of asynchronous tasks rather than multithreading. See https://stackoverflow.com/a/14130314/120955 and What is the difference between asynchronous programming and multithreading?.
If you're performing operations that require CPU power, and you really need them to run in parallel to squeeze that extra bit of performance out of them, try to avoid changing state in each one (e.g. setting values for shared variables, like count++
). One good strategy for this is Command/Query Separation, where you do your parallel processing on immutable data structures to produce "answers", and then use those answers to make the changes that must be made all on the same thread. Here's how that might look in your code:
var logs = RunModuleConfigVariables
.AsParallel()
.WithDegreeOfParallelism(3)
.Select(e =>
"Module Name " + e.Key + " thread: " + Thread.CurrentThread.ManagedThreadId + "\n"
+ string.Join("\n",
e.Value
.AsParallel()
.WithDegreeOfParallelism(10)
.Select((eachendpoint, index) => "\t" + index + " Endpoint Name " + eachendpoint)
));
Console.WriteLine(string.Join("\n", logs));
Finally, if you absolutely must change state in parallel, you need to take the time to learn about locks, Mutexes, Concurrent Collections, atomic operations, and other similar tools, and make sure you're only using thread-safe methods in parallel contexts, in order to make sure you're doing it "right."
That might lead to something like this:
Parallel.ForEach(RunModuleConfigVariables, new ParallelOptions { MaxDegreeOfParallelism = 3 }, pair =>
{
Console.WriteLine("Module Name " + pair.Key + " thread: " + Thread.CurrentThread.ManagedThreadId);
var count = 0;
Parallel.ForEach(pair.Value, new ParallelOptions { MaxDegreeOfParallelism = 10 }, eachendpoint =>
{
var thisCount = Interlocked.Increment(ref count);
Console.WriteLine("\t" + thisCount + " Endpoint Name " + eachendpoint + "\n");
});
});
Upvotes: 4
Reputation: 6604
The problem is that your variable log
is being assigned to by multiple threads. You need to lock
it before you attempt to write to it.
Parallel.For(0, RunModuleConfigVariables.Count, new ParallelOptions { MaxDegreeOfParallelism = 3 }, index => {
string log = null;
int count = 0;
log += "Module Name " + RunModuleConfigVariables.Keys.ElementAt(index) + " thread: " + Thread.CurrentThread.ManagedThreadId + "\n";
object locker = new object();
Parallel.ForEach(RunModuleConfigVariables[RunModuleConfigVariables.Keys.ElementAt(index)], new ParallelOptions { MaxDegreeOfParallelism = 10 }, eachendpoint => {
lock(locker)
log += "\t" + (count++) + " Endpoint Name " + eachendpoint + "\n";
});
Console.WriteLine(log);
});
Upvotes: 1