Reputation: 520
I am having an issue with the performance of a LINQ query and so I created a small simplified example to demonstrate the issue below. The code takes a random list of small integers and returns the list partitioned into several smaller lists each which totals 10 or less.
The problem is that (as I've written this) the code takes exponentially longer with N. This is only an O(N) problem. With N=2500, the code takes over 10 seconds to run on my pc.
I would appriciate greatly if someone could explain what is going on. Thanks, Mark.
int N = 250;
Random r = new Random();
var work = Enumerable.Range(1,N).Select(x => r.Next(0, 6)).ToList();
var chunks = new List<List<int>>();
// work.Dump("All the work."); // LINQPad Print
var workEnumerable = work.AsEnumerable();
Stopwatch sw = Stopwatch.StartNew();
while(workEnumerable.Any()) // or .FirstorDefault() != null
{
int soFar = 0;
var chunk = workEnumerable.TakeWhile( x =>
{
soFar += x;
return (soFar <= 10);
}).ToList();
chunks.Add(chunk); // Commented out makes no difference.
workEnumerable = workEnumerable.Skip(chunk.Count); // <== SUSPECT
}
sw.Stop();
// chunks.Dump("Work Chunks."); // LINQPad Print
sw.Elapsed.Dump("Time elapsed.");
Upvotes: 14
Views: 2498
Reputation: 71563
The Skip()
method and others like it basically create a placeholder object, implementing IEnumerable, that references its parent enumerable and contains the logic to perform the skipping. Skips in loops, therefore, are non-performant, because instead of throwing away elements of the enumerable, like you think they are, they add a new layer of logic that's lazily executed when you actually need the first element after all the ones you've skipped.
You can get around this by calling ToList()
or ToArray()
. This forces "eager" evaluation of the Skip()
method, and really does get rid of the elements you're skipping from the new collection you will be enumerating. That comes at an increased memory cost, and requires all of the elements to be known (so if you're running this on an IEnumerable
that represents an infinite series, good luck).
The second option is to not use Linq, and instead use the IEnumerable
implementation itself, to get and control an IEnumerator
. Then instead of Skip()
, simply call MoveNext()
the necessary number of times.
Upvotes: 2
Reputation: 392921
You effectively keep chaining Skip() onto the same enumerable. In a list of 250, the last chunk will be created from a lazy enumerable with ~25 'Skip' enumerator classes on the front.
You would find things become a lot faster, already if you did
workEnumerable = workEnumerable.Skip(chunk.Count).ToList();
However, I think the whole approach could be altered.
How about using standard LINQ to achieve the same:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
private readonly static Random r = new Random();
public static void Main(string[] args)
{
int N = 250;
var work = Enumerable.Range(1,N).Select(x => r.Next(0, 6)).ToList();
var chunks = work.Select((o,i) => new { Index=i, Obj=o })
.GroupBy(e => e.Index / 10)
.Select(group => group.Select(e => e.Obj).ToList())
.ToList();
foreach(var chunk in chunks)
Console.WriteLine("Chunk: {0}", string.Join(", ", chunk.Select(i => i.ToString()).ToArray()));
}
}
Upvotes: 5
Reputation: 39950
What .Skip()
does is create a new IEnumerable
that loops over the source, and only begins yielding results after the first N
elements. You chain who knows how many of these after each other. Everytime you call .Any()
, you need to loop over all the previously skipped elements again.
Generally speaking, it's a bad idea to set up very complicated operator chains in LINQ and enumerating them repeatedly. Also, since LINQ is a querying API, methods like Skip()
are a bad choice when what you're trying to achieve amounts to modifying a data structure.
Upvotes: 12