YurikoEX
YurikoEX

Reputation: 105

PLINQ ForAll broken in .NET 4.0 and 4.5

I'm trying to come up with a way to speed up combining massive amounts of objects contained in Lists in the quickest fashion possible. Hoping to take advantage of PLINQ I attempted it but this is not a thread-safe solution. I have tested in VS2010 and VS11Beta in 4.0 and 4.5. This is my sample app. If you change BlowUp() between 1-500 it will usually work. After 500 the wheels come of the track. It will fail in multiple places. Does anyone know the fastest way to solve this problem? (Multidimensional array + PLINQ?)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace PLinqBlowsUp
{
class Program
{
    static void Main(string[] args)
    {
        BlowUp(5000);
    }

    private static void BlowUp(int blowupNum)
    {
        try
        {
            var theExistingMasterListOfAllRowsOfData = new List<List<KeyValuePair<string, dynamic>>>();

            //Add some test data
            Enumerable.Range(0, blowupNum).AsParallel().ForAll(row => theExistingMasterListOfAllRowsOfData.Add(AddRowToMasterList(row)));


            var aNewRowOfData = new List<KeyValuePair<string, dynamic>>();
            //Add some test data
            var column = new KeyValuePair<string, dynamic>("Title", "MyTitle");
            aNewRowOfData.Add(column);

            var anotherNewRowOfData = new List<KeyValuePair<string, dynamic>>();
            //Add some test data
            var columnA = new KeyValuePair<string, dynamic>("Date", DateTime.Now);
            var columnB = new KeyValuePair<string, dynamic>("ImportantColumn", "ImportantData");
            var columnC = new KeyValuePair<string, dynamic>("VeryImportantColumn", "VeryImportantData");
            anotherNewRowOfData.Add(columnA);
            anotherNewRowOfData.Add(columnB);
            anotherNewRowOfData.Add(columnC);

            //Now the Problem
            aNewRowOfData.AsParallel().ForAll(anrod => theExistingMasterListOfAllRowsOfData.ForEach(temloarod => temloarod.Add(anrod)));
            anotherNewRowOfData.AsParallel().ForAll(anrod => theExistingMasterListOfAllRowsOfData.ForEach(temloarod => temloarod.Add(anrod)));

            //Test for number
            foreach (var masterRow in theExistingMasterListOfAllRowsOfData)
            {
                if (masterRow.Count != 7)
                    throw new Exception("BLOW UP!!!");
            }
        }
        catch (AggregateException ex)
        {
            Console.WriteLine(ex.Message);
        }
    }

    private static List<KeyValuePair<string, dynamic>> AddRowToMasterList(int row)
    {
        var columnA = new KeyValuePair<string, dynamic>("FirstName", "John" + row.ToString());
        var columnB = new KeyValuePair<string, dynamic>("LastName", "Smith" + row.ToString());
        var columnC = new KeyValuePair<string, dynamic>("Ssn", 123456789 + (row*10));

        var list = new List<KeyValuePair<string, dynamic>>();
        list.Add(columnA);
        list.Add(columnB);
        list.Add(columnC);
        return list;
    }
}
}

Upvotes: 1

Views: 610

Answers (3)

Brian Gideon
Brian Gideon

Reputation: 48949

I see two problems.

  • You are calling Add on the theExistingMasterListOfAllRowsOfData instance from more than one thread without any attempt to synchronize access to it.
  • You are calling Add on the individual List<KeyValuePair<string, dynamic>> items from more than one thread without any attempt to synchronize them.

You could use lock to protect the Add methods or use a ConcurrentBag instead. However, neither of these options are that good. The issue here is that this kind of operation cannot be parallelized very well because all of the threads will end up competing for the same lock. I highly suspect that even the low-lock ConcurrentBag will end up being slower than had you just punted on PLINQ and done everything on the main thread to begin with.

Upvotes: 3

BrokenGlass
BrokenGlass

Reputation: 160882

This has nothing to do with PLinq - adding an item to a List<T> is simply not thread-safe. A workable solution that would degrade performance though would be to introduce locking. What you typically would want to do instead is projecting to a new collection as the result of your PLinq statement - introducing side-effects like you did is rather not in the spirit of Linq / functional programming in general and you can run into trouble (as you did).

Upvotes: 4

Kirk Woll
Kirk Woll

Reputation: 77546

PLinq is not a substitute for writing thread-safe code. Your access to theExistingMasterListOfAllRowsOfData is not thread-safe as it is being accessed by all the threads from the thread pool. You could try locking around it, which solved the problem for me:

Enumerable.Range(0, blowupNum).AsParallel().ForAll(row => {
    lock (theExistingMasterListOfAllRowsOfData) {                 
        theExistingMasterListOfAllRowsOfData.Add(AddRowToMasterList(row));
    }
});

However, locking might not be what you're after, since that will introduce bottlenecks.

Upvotes: 2

Related Questions