Reputation: 2353
I've been using a method for splitting collections into batches form this answer - https://stackoverflow.com/a/17598878/1012739:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) {
using (IEnumerator<T> enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return TakeIEnumerator(enumerator, size);
}
private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
int i = 0;
do yield return source.Current;
while (++i < size && source.MoveNext());
}
When iterating over the results of Batch<T>
one gets the expected number of collections, but when calling Count
or ToList
the outer collection length is reported:
var collection = new int[10];
var count = 0;
foreach(var batch in collection.Batch(2))
++count;
Assert.AreEqual(5, count); // Passes
// But
Assert.AreEqual(5, collection.Batch(2).Count()); // Fails
Assert.AreEqual(5, collection.Batch(2).ToList().Count); // Fails
How does this work and is the a way to fix it?
Upvotes: 2
Views: 213
Reputation: 1064134
Your TakeIEnumerator<T>
method is dependent upon the position of the enumerator (source
), and thus is timing dependent... on itself. If the results are iterated by collating the "outer" results first, i.e.
var batches = source.Batch(24).ToList();
// then iterate in any way
then by definition, source
is exhausted, and you'll get N items in batches
, where N
is the number from source
, and all the batches will be empty because there is no more data. If, however, the results are iterated depth first, i.e.
foreach (var batch in source) {
foreach (var item in batch) {...}
}
then you are looking at the open cursor. Ultimately, this approach is inherently brittle and dangerous. IMO your batch method should create buffers of computed data, perhaps a List<T>
or similar. This will allocate, but: it'll be reliable. For example:
private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
var buffer = new List<T>(size);
int i = 0;
do buffer.Add(source.Current);
while (++i < size && source.MoveNext())
return buffer;
}
Upvotes: 3