Reputation: 23
I have a class Test1
which call Test2
class method.
public class Test1
{
public void Testmethod1(List<InputData> request)
{
//get data from sql : Huge list inputs around more then 150K
var inputs = new List<InputData>();
var output = Test2.Testmethod2(inputs);
}
}
Test2
class has processing method as below:
public class Test2
{
//request list count 200K
public static List<OutputData> Testmethod2(List<InputData> request)
{
object sync = new Object();
var output = new List<OutputData>();
var output1 = new List<OutputData>();
//data count: 20K
var data = request.Select(x => x.Input2).Distinct().ToList();
//method calling using for each : processing time 4 hours
foreach (var n in data)
{
output.AddRange(ProcessData(request.Where(x => x.Input1 == n)));
}
// method calling using Parallel.ForEach,processing time 4 hours
var options = new ParallelOptions { MaxDegreeOfParallelism = 3 };
Parallel.ForEach(data, options, n =>
{
lock (sync)
{
output1.AddRange(ProcessData(request.Where(x => x.Input1 == n)));
}
});
return output;
}
public static List<OutputData> ProcessData(IEnumerable<InputData> inputData)
{
var result = new List<OutputData>();
//processing on the input data
return result;
}
}
public class InputData
{
public int Input1 { get; set; }
public int Input2 { get; set; }
public int Input3 { get; set; }
public DateTime Input4 { get; set; }
public int Input5 { get; set; }
public int Input6 { get; set; }
public string Input7 { get; set; }
public int Input8 { get; set; }
public int Input9 { get; set; }
}
public class OutputData
{
public int Ouputt1 { get; set; }
public int Output2 { get; set; }
public int Output3 { get; set; }
public int output4 { get; set; }
}
its taking quite a long time to process data around 4 hours.Even Parallel.foreach working like normal one. I am thinking to use Dictionary to store input data however the data is not unique and doesnt have unique row.
Is there a better approach where we can optimize it?
Thanks!
Upvotes: 0
Views: 868
Reputation: 1285
(from n in request
//Group items in request by unique values of Input2
group n by n.Input2)
.AsParallel()
.WithDegreeOfParallelism(4)
.Select(data => Test2.ProcessData(
//Filter inputs
data.Where(x => x.Input1 == data.Key)
))
.Cast<IEnumerable<OutputData>>()
//Combine the output
.Aggregate(Enumerable.Concat)
//Generate the final list
.ToList();
The idea is to group request
by InputData.Input2
values, process the batches in parallel and collect all the results.
Conceptually, this is a variation of @[Panagiotis Kanavos]'s answer
Upvotes: 0
Reputation: 131180
Right now, the code is using brute force to perform 20K full searches for 20K items. That's 400M iterations.
I suspect performance will improve far more simply by creating a dictionary or a lookup (if there are multiple items per key), eg:
var myIndex=request.ToLookup(x=>x.Input1);
var output = new List<OutputData>(20000);
foreach (var n in data)
{
output.AddRange(ProcessData(myIndex[n]));
}
I specify a capacity
in the list constructor to reduce reallocations each time the list's internal buffer gets full. There's no need for a precise value.
If the code is still slow, one approach would be to use Parallel.ForEach
or use PLINQ, eg :
var output= ( from n in data.AsParallel().WithDegreeOfParallelism(3)
let dt=myIndex[n]
select ProcessData(dt)
).ToList();
Upvotes: 1