Reputation: 105
I'm trying to come up with a way to speed up combining massive amounts of objects contained in Lists in the quickest fashion possible. Hoping to take advantage of PLINQ I attempted it but this is not a thread-safe solution. I have tested in VS2010 and VS11Beta in 4.0 and 4.5. This is my sample app. If you change BlowUp() between 1-500 it will usually work. After 500 the wheels come of the track. It will fail in multiple places. Does anyone know the fastest way to solve this problem? (Multidimensional array + PLINQ?)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace PLinqBlowsUp
{
class Program
{
static void Main(string[] args)
{
BlowUp(5000);
}
private static void BlowUp(int blowupNum)
{
try
{
var theExistingMasterListOfAllRowsOfData = new List<List<KeyValuePair<string, dynamic>>>();
//Add some test data
Enumerable.Range(0, blowupNum).AsParallel().ForAll(row => theExistingMasterListOfAllRowsOfData.Add(AddRowToMasterList(row)));
var aNewRowOfData = new List<KeyValuePair<string, dynamic>>();
//Add some test data
var column = new KeyValuePair<string, dynamic>("Title", "MyTitle");
aNewRowOfData.Add(column);
var anotherNewRowOfData = new List<KeyValuePair<string, dynamic>>();
//Add some test data
var columnA = new KeyValuePair<string, dynamic>("Date", DateTime.Now);
var columnB = new KeyValuePair<string, dynamic>("ImportantColumn", "ImportantData");
var columnC = new KeyValuePair<string, dynamic>("VeryImportantColumn", "VeryImportantData");
anotherNewRowOfData.Add(columnA);
anotherNewRowOfData.Add(columnB);
anotherNewRowOfData.Add(columnC);
//Now the Problem
aNewRowOfData.AsParallel().ForAll(anrod => theExistingMasterListOfAllRowsOfData.ForEach(temloarod => temloarod.Add(anrod)));
anotherNewRowOfData.AsParallel().ForAll(anrod => theExistingMasterListOfAllRowsOfData.ForEach(temloarod => temloarod.Add(anrod)));
//Test for number
foreach (var masterRow in theExistingMasterListOfAllRowsOfData)
{
if (masterRow.Count != 7)
throw new Exception("BLOW UP!!!");
}
}
catch (AggregateException ex)
{
Console.WriteLine(ex.Message);
}
}
private static List<KeyValuePair<string, dynamic>> AddRowToMasterList(int row)
{
var columnA = new KeyValuePair<string, dynamic>("FirstName", "John" + row.ToString());
var columnB = new KeyValuePair<string, dynamic>("LastName", "Smith" + row.ToString());
var columnC = new KeyValuePair<string, dynamic>("Ssn", 123456789 + (row*10));
var list = new List<KeyValuePair<string, dynamic>>();
list.Add(columnA);
list.Add(columnB);
list.Add(columnC);
return list;
}
}
}
Upvotes: 1
Views: 610
Reputation: 48949
I see two problems.
Add
on the theExistingMasterListOfAllRowsOfData
instance from more than one thread without any attempt to synchronize access to it.Add
on the individual List<KeyValuePair<string, dynamic>>
items from more than one thread without any attempt to synchronize them.You could use lock
to protect the Add
methods or use a ConcurrentBag
instead. However, neither of these options are that good. The issue here is that this kind of operation cannot be parallelized very well because all of the threads will end up competing for the same lock. I highly suspect that even the low-lock ConcurrentBag
will end up being slower than had you just punted on PLINQ and done everything on the main thread to begin with.
Upvotes: 3
Reputation: 160882
This has nothing to do with PLinq - adding an item to a List<T>
is simply not thread-safe. A workable solution that would degrade performance though would be to introduce locking. What you typically would want to do instead is projecting to a new collection as the result of your PLinq statement - introducing side-effects like you did is rather not in the spirit of Linq / functional programming in general and you can run into trouble (as you did).
Upvotes: 4
Reputation: 77546
PLinq is not a substitute for writing thread-safe code. Your access to theExistingMasterListOfAllRowsOfData
is not thread-safe as it is being accessed by all the threads from the thread pool. You could try locking around it, which solved the problem for me:
Enumerable.Range(0, blowupNum).AsParallel().ForAll(row => {
lock (theExistingMasterListOfAllRowsOfData) {
theExistingMasterListOfAllRowsOfData.Add(AddRowToMasterList(row));
}
});
However, locking might not be what you're after, since that will introduce bottlenecks.
Upvotes: 2