zzandy
zzandy

Reputation: 2353

Count and foreach yield different results

I've been using a method for splitting collections into batches form this answer - https://stackoverflow.com/a/17598878/1012739:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) {
    using (IEnumerator<T> enumerator = source.GetEnumerator())
        while (enumerator.MoveNext())
            yield return TakeIEnumerator(enumerator, size);
}

private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
    int i = 0;
    do yield return source.Current;
    while (++i < size && source.MoveNext());
}

When iterating over the results of Batch<T> one gets the expected number of collections, but when calling Count or ToList the outer collection length is reported:

var collection = new int[10];
var count = 0;
foreach(var batch in collection.Batch(2))
    ++count;
Assert.AreEqual(5, count); // Passes
// But
Assert.AreEqual(5, collection.Batch(2).Count());        // Fails
Assert.AreEqual(5, collection.Batch(2).ToList().Count); // Fails

How does this work and is the a way to fix it?

Upvotes: 2

Views: 213

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1064134

Your TakeIEnumerator<T> method is dependent upon the position of the enumerator (source), and thus is timing dependent... on itself. If the results are iterated by collating the "outer" results first, i.e.

var batches = source.Batch(24).ToList();
// then iterate in any way

then by definition, source is exhausted, and you'll get N items in batches, where N is the number from source, and all the batches will be empty because there is no more data. If, however, the results are iterated depth first, i.e.

foreach (var batch in source) {
    foreach (var item in batch) {...}
}

then you are looking at the open cursor. Ultimately, this approach is inherently brittle and dangerous. IMO your batch method should create buffers of computed data, perhaps a List<T> or similar. This will allocate, but: it'll be reliable. For example:

private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
    var buffer = new List<T>(size);
    int i = 0;
    do buffer.Add(source.Current);
    while (++i < size && source.MoveNext())
    return buffer;
}

Upvotes: 3

Related Questions