How to improve performance while processing data from a huge list in C#?

Question

I have a class Test1 which call Test2 class method.

public class Test1
{
    public void Testmethod1(List request)

    {
        //get data from sql : Huge list inputs around more then 150K
        var inputs = new List();
        var output = Test2.Testmethod2(inputs);
    }
}

Test2 class has processing method as below:

public class Test2
{
     //request list count 200K
    public static List Testmethod2(List request)

    {
        object sync = new Object();
        var output = new List();
        var output1 = new List();
        //data count: 20K
        var data = request.Select(x => x.Input2).Distinct().ToList();
        
        //method calling using for each : processing time 4 hours
        foreach (var n in data)
        {
            output.AddRange(ProcessData(request.Where(x => x.Input1 == n)));

        }

        // method calling using  Parallel.ForEach,processing time 4 hours

        var options = new ParallelOptions { MaxDegreeOfParallelism = 3 };      
        Parallel.ForEach(data, options, n =>
        {
            
            lock (sync)
            {
                output1.AddRange(ProcessData(request.Where(x => x.Input1 == n)));
            }


        });

        return output;
    }



    public static List ProcessData(IEnumerable inputData)
    {
        var result = new List();
        //processing on the input data
        return result;

    }

}


public class InputData
{
    public int Input1 { get; set; }
    public int Input2 { get; set; }
    public int Input3 { get; set; }
    public DateTime Input4 { get; set; }
    public int Input5 { get; set; }
    public int Input6 { get; set; }
    public string Input7 { get; set; }
    public int Input8 { get; set; }
    public int Input9 { get; set; }
}

public class OutputData
{
    public int Ouputt1 { get; set; }
    public int Output2 { get; set; }
    public int Output3 { get; set; }
    public int output4 { get; set; }

}

its taking quite a long time to process data around 4 hours.Even Parallel.foreach working like normal one. I am thinking to use Dictionary to store input data however the data is not unique and doesnt have unique row.

Is there a better approach where we can optimize it?

Thanks!

Panagiotis Kanavos · Accepted Answer

Right now, the code is using brute force to perform 20K full searches for 20K items. That's 400M iterations.

I suspect performance will improve far more simply by creating a dictionary or a lookup (if there are multiple items per key), eg:

var myIndex=request.ToLookup(x=>x.Input1);
var output = new List(20000);
foreach (var n in data)
{
    output.AddRange(ProcessData(myIndex[n]));
}

I specify a capacity in the list constructor to reduce reallocations each time the list's internal buffer gets full. There's no need for a precise value.

If the code is still slow, one approach would be to use Parallel.ForEach or use PLINQ, eg :

var output= ( from n in data.AsParallel().WithDegreeOfParallelism(3)
              let dt=myIndex[n]
              select ProcessData(dt)
            ).ToList();

How to improve performance while processing data from a huge list in C#?

Answers (2)

Related Questions