ldam
ldam

Reputation: 4585

Efficiently calculating totals from a file using LINQ

I'm reading a file and turning each line within it into a class, let's call it Record, and returning each Record as it is read using IEnumerable<Record> and yield return.

Because of this I only start actually performing these reads whenever I do an operation on the enumeration, such as performing a sum on it or iterating through it with a foreach.

I do need to go through each record and then translate that into a database, but due to database design before my time I need the totals on each record in the database, so I need these totals before I start translating them into my database.

At the moment I have five separate .Count() or .Sum() operations on my enumeration before I start iterating the enumeration (example int i = records.Sum(r => r.SomeField) or int j = records.Count(r => r.IsSomethingTrue)). Each one of those counts or sums will loop through the entire file to calculate each one separately. I'm not really happy with this behaviour and would like to find a more efficient way of doing this.

I am using .NET 3.5 if that makes any difference.

Upvotes: 2

Views: 87

Answers (2)

Mark Shevchenko
Mark Shevchenko

Reputation: 8197

You could use your own struct to calculate a few values at the single pass through an enumerable object.

public struct ComplexAccumulator
{
    public int TotalSumField { get; set; }

    public int CountSomethingTrue { get; set; }
}

Now you can use Aggreagate extension method to accumulate values:

records.Aggregate(default(ComplexAccumulator), (a, r) => new ComplexAccumulator
{
    TotalSumFiled = a.TotalSumField + r.SumField,
    CountSomethingTrue = a.CountSomethingTrue + r.IsSomethingTrue ? 1 : 0,
});

Instead of the struct you could use suitable Tuple instance, f.e. something like Tuple<int, int, int>.

Upvotes: 1

usr
usr

Reputation: 171178

Efficiency is not a strength of LINQ... You need to replace some LINQ things with manual loops here.

You seem to need two passes over the data. One for aggregation:

var sum = 0; //etc.
foreach (var item in items) {
 //compute all 5 aggregates here
}

And then one to translate the data:

items.Select(item => Translate(item, aggregates))

Whether you should buffer items (for example using ToList) or not depends on whether available memory can hold those items or not.

You can use Aggregate to perform all 5 aggregations in one pass but that's not better than a loop in any way. It's slower, far more code and the code arguably is illegible.

Upvotes: 0

Related Questions