BlakeH
BlakeH

Reputation: 3494

Create batches in LINQ

Can someone suggest a way to create batches of a certain size in LINQ?

Ideally I want to be able to perform operations in chunks of some configurable amount.

Upvotes: 175

Views: 120012

Answers (21)

infogulch
infogulch

Reputation: 1301

This is a fully lazy, low overhead, one-function implementation of Batch that doesn't do any accumulation and instead forwards iteration steps directly to the source IEnumerable, similar to python's itertools.GroupBy.

This design eliminates copying and buffering, which is nice, but has the following consequences:

  • Elements must be enumerated in order.
  • Elements must not be accessed more than once.
  • Elements that aren't consumed in a batch are explicitly discarded when the next batch is requested.

If these conditions are violated, that is, trying to access elements out of order, a second time, or via a saved iterator, .NET will throw an exception such as InvalidOperationException: Enumeration already finished.. Note that the design and type, IEnumerable<IEnumerable<T>>, naturally implies these conditions; make up your own mind whether that counts as "exception unsafe" or "misuse".

You can test a complete sample at .NET Fiddle.

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (size <= 0)
        throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
    using (var enumerator = source.GetEnumerator())
        while (enumerator.MoveNext())
        {
            int i = 0;
            
            // Batch is a local iterator function closing over `i` and
            // `enumerator` that executes the inner batch enumeration
            IEnumerable<T> Batch()
            {
                do yield return enumerator.Current;
                while (++i < size && enumerator.MoveNext());
            }

            yield return Batch();
            
            // discard skipped items
            while (++i < size && enumerator.MoveNext());
        }
}


// Buffer() explicitly buffers the contents of intermediate IEnumerables, lifting the conditions of enumerating in order etc.
// but loses the memory and overhead benefits of not copying/buffering.
public static IEnumerable<IReadOnlyList<T>> Buffer<T>(this IEnumerable<IEnumerable<T>> batched) => batched.Select(e => e.ToList());

This solution is based on (and fixes issues in) Nick Whaley's solution with help from EricRoller.

Upvotes: 37

dana
dana

Reputation: 18155

An Enumerable.Chunk() extension method was added to .NET 6.0.

Example:

var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };

var chunks = list.Chunk(3);
// returns { { 1, 2, 3 }, { 4, 5, 6 }, { 7 } }

For those who cannot upgrade, the source is available on GitHub.

Upvotes: 156

Dave Black
Dave Black

Reputation: 8049

Here is an implementation that uses Async iteration in C# via IAsyncEnumerable - https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/tutorials/generate-consume-asynchronous-stream

public static class EnumerableExtensions
{
    /// <summary>
    /// Chunks a sequence into a sub-sequences each containing maxItemsPerChunk, except for the last
    /// which will contain any items left over.
    ///
    /// NOTE: this implements a streaming implementation via <seealso cref="IAsyncEnumerable{T}"/>.
    /// </summary>
    public static async IAsyncEnumerable<IEnumerable<T>> ChunkAsync<T>(this IAsyncEnumerable<T> sequence, int maxItemsPerChunk)
    {
        if (sequence == null) throw new ArgumentNullException(nameof(sequence));
        if (maxItemsPerChunk <= 0)
        {
            throw new ArgumentOutOfRangeException(nameof(maxItemsPerChunk), $"{nameof(maxItemsPerChunk)} must be greater than 0");
        }

        var chunk = new List<T>(maxItemsPerChunk);
        await foreach (var item in sequence)
        {
            chunk.Add(item);

            if (chunk.Count == maxItemsPerChunk)
            {
                yield return chunk.ToArray();
                chunk.Clear();
            }
        }

        // return the "crumbs" that 
        // didn't make it into a full chunk
        if (chunk.Count > 0)
        {
            yield return chunk.ToArray();
        }
    }

    /// <summary>
    /// Chunks a sequence into a sub-sequences each containing maxItemsPerChunk, except for the last
    /// which will contain any items left over.
    /// </summary>
    public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> sequence, int maxItemsPerChunk)
    {
        if (sequence == null) throw new ArgumentNullException(nameof(sequence));
        if (maxItemsPerChunk <= 0)
        {
            throw new ArgumentOutOfRangeException(nameof(maxItemsPerChunk), $"{nameof(maxItemsPerChunk)} must be greater than 0");
        }

        var chunk = new List<T>(maxItemsPerChunk);
        foreach (var item in sequence)
        {
            chunk.Add(item);

            if (chunk.Count == maxItemsPerChunk)
            {
                yield return chunk.ToArray();
                chunk.Clear();
            }
        }

        // return the "crumbs" that 
        // didn't make it into a full chunk
        if (chunk.Count > 0)
        {
            yield return chunk.ToArray();
        }
    }
}

Upvotes: 1

Lu4
Lu4

Reputation: 15032

Another way to perform batching:

public static class Extensions
{
    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;

                yield return func(v0, v1);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;

                yield return func(v0, v1, v2);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;

                yield return func(v0, v1, v2, v3);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v14 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14);
            }
        }
    }

    public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
    {
        using (var enumerator = source.GetEnumerator())
        {
            while (true)
            {
                bool state;

                state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v14 = enumerator.Current;
                state = enumerator.MoveNext(); if (!state) break; var v15 = enumerator.Current;

                yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15);
            }
        }
    }
}

Here's an example usage:

using System;
using System.Linq;


namespace TestProgram
{
    class Program
    {
        static void Main(string[] args)
        {
            foreach (var item in Enumerable.Range(0, 12).ToArray().Batch((R, X1, Y1, X2, Y2) => (R, X1, Y1, X2, Y2)))
            {
                Console.WriteLine($"{item.R}, {item.X1}, {item.Y1}, {item.X2}, {item.Y2}");
            }
        }
    }
}

Upvotes: 0

Enigmativity
Enigmativity

Reputation: 117124

Here's the cleanest version of Batch that I can come up with:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int count)
{
    if (source == null) throw new System.ArgumentNullException("source");
    if (count <= 0) throw new System.ArgumentOutOfRangeException("count");
    using (var enumerator = source.GetEnumerator())
    {
        IEnumerable<T> BatchInner()
        {
            int counter = 0;
            do
                yield return enumerator.Current;
            while (++counter < count && enumerator.MoveNext());
        }
        while (enumerator.MoveNext())
            yield return BatchInner().ToArray();
    }
}

Using this code:

Console.WriteLine(String.Join(Environment.NewLine,
    Enumerable.Range(0, 20).Batch(8).Select(xs => String.Join(",", xs))));

I get:

0,1,2,3,4,5,6,7
8,9,10,11,12,13,14,15
16,17,18,19

It's important to note that on the answers from "" & "" that this code fails:

var e = Enumerable.Range(0, 20).Batch(8).ToArray();

Console.WriteLine(String.Join(Environment.NewLine, e.Select(xs => String.Join(",", xs))));
Console.WriteLine();
Console.WriteLine(String.Join(Environment.NewLine, e.Select(xs => String.Join(",", xs))));

On their answers it gives:

19
19
19

19
19
19

Due to the inner enumerable not being computed as an array.

Upvotes: 3

Majid Shahabfar
Majid Shahabfar

Reputation: 4829

As a new helper method for LINQ in .NET 6 you can chunk any IEnumerable into batches:

int chunkNumber = 1;
foreach (int[] chunk in Enumerable.Range(0, 9).Chunk(3))
{
    Console.WriteLine($"Chunk {chunkNumber++}");
    foreach (var item in chunk)
    {
        Console.WriteLine(item);
    }
}

Upvotes: 1

Ant
Ant

Reputation: 11

An easy version to use and understand.

    public static List<List<T>> chunkList<T>(List<T> listToChunk, int batchSize)
    {
        List<List<T>> batches = new List<List<T>>();

        if (listToChunk.Count == 0) return batches;

        bool moreRecords = true;
        int fromRecord = 0;
        int countRange = 0;
        if (listToChunk.Count >= batchSize)
        {
            countRange = batchSize;
        }
        else
        {
            countRange = listToChunk.Count;
        }

        while (moreRecords)
        {
            List<T> batch = listToChunk.GetRange(fromRecord, countRange);
            batches.Add(batch);

            if ((fromRecord + batchSize) >= listToChunk.Count)
            {
                moreRecords = false;
            }

            fromRecord = fromRecord + batch.Count;

            if ((fromRecord + batchSize) > listToChunk.Count)
            {
                countRange = listToChunk.Count - fromRecord;
            }
            else
            {
                countRange = batchSize;
            }
        }
        return batches;
    }

Upvotes: 1

Mong Zhu
Mong Zhu

Reputation: 23732

I wonder why nobody has ever posted an old school for-loop solution. Here is one:

List<int> source = Enumerable.Range(1,23).ToList();
int batchsize = 10;
for (int i = 0; i < source.Count; i+= batchsize)
{
    var batch = source.Skip(i).Take(batchsize);
}

This simplicity is possible because the Take method:

... enumerates source and yields elements until count elements have been yielded or source contains no more elements. If count exceeds the number of elements in source, all elements of source are returned

Disclaimer:

Using Skip and Take inside the loop means that the enumerable will be enumerated multiple times. This is dangerous if the enumerable is deferred. It may result in multiple executions of a database query, or a web request, or a file read. This example is explicitly for the usage of a List which is not deferred, so it is less of a problem. It is still a slow solution since skip will enumerate the collection each time it is called.

This can also be solved using the GetRange method, but it requires an extra calculation to extract a possible rest batch:

for (int i = 0; i < source.Count; i += batchsize)
{
    int remaining = source.Count - i;
    var batch = remaining > batchsize  ? source.GetRange(i, batchsize) : source.GetRange(i, remaining);
}

Here is a third way to handle this, which works with 2 loops. This ensures that the collection is enumerated only 1 time!:

int batchsize = 10;
List<int> batch = new List<int>(batchsize);

for (int i = 0; i < source.Count; i += batchsize)
{
    // calculated the remaining items to avoid an OutOfRangeException
    batchsize = source.Count - i > batchsize ? batchsize : source.Count - i;
    for (int j = i; j < i + batchsize; j++)
    {
        batch.Add(source[j]);
    }           
    batch.Clear();
}

Upvotes: 14

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236268

You don't need to write any code. Use MoreLINQ Batch method, which batches the source sequence into sized buckets (MoreLINQ is available as a NuGet package you can install):

int size = 10;
var batches = sequence.Batch(size);

Which is implemented as:

public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
                  this IEnumerable<TSource> source, int size)
{
    TSource[] bucket = null;
    var count = 0;

    foreach (var item in source)
    {
        if (bucket == null)
            bucket = new TSource[size];

        bucket[count++] = item;
        if (count != size)
            continue;

        yield return bucket;

        bucket = null;
        count = 0;
    }

    if (bucket != null && count > 0)
        yield return bucket.Take(count).ToArray();
}

Upvotes: 151

Theodor Zoulias
Theodor Zoulias

Reputation: 43748

Here is an attempted improvement of Nick Whaley's (link) and infogulch's (link) lazy Batch implementations. This one is strict. You either enumerate the batches in the correct order, or you get an exception.

public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
    this IEnumerable<TSource> source, int size)
{
    if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
    using (var enumerator = source.GetEnumerator())
    {
        int i = 0;
        while (enumerator.MoveNext())
        {
            if (i % size != 0) throw new InvalidOperationException(
                "The enumeration is out of order.");
            i++;
            yield return GetBatch();
        }
        IEnumerable<TSource> GetBatch()
        {
            while (true)
            {
                yield return enumerator.Current;
                if (i % size == 0 || !enumerator.MoveNext()) break;
                i++;
            }
        }
    }
}

And here is a lazy Batch implementation for sources of type IList<T>. This one imposes no restrictions on the enumeration. The batches can be enumerated partially, in any order, and more than once. The restriction of not modifying the collection during the enumeration is still in place though. This is achieved by making a dummy call to enumerator.MoveNext() before yielding any chunk or element. The downside is that the enumerator is left undisposed, since it is unknown when the enumeration is going to finish.

public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
    this IList<TSource> source, int size)
{
    if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
    var enumerator = source.GetEnumerator();
    for (int i = 0; i < source.Count; i += size)
    {
        enumerator.MoveNext();
        yield return GetChunk(i, Math.Min(i + size, source.Count));
    }
    IEnumerable<TSource> GetChunk(int from, int toExclusive)
    {
        for (int j = from; j < toExclusive; j++)
        {
            enumerator.MoveNext();
            yield return source[j];
        }
    }
}

Upvotes: 4

MrD at KookerellaLtd
MrD at KookerellaLtd

Reputation: 2797

So with a functional hat on, this appears trivial....but in C#, there are some significant downsides.

you'd probably view this as an unfold of IEnumerable (google it and you'll probably end up in some Haskell docs, but there may be some F# stuff using unfold, if you know F#, squint at the Haskell docs and it will make sense).

Unfold is related to fold ("aggregate") except rather than iterating through the input IEnumerable, it iterates through the output data structures (its a similar relationship between IEnumerable and IObservable, in fact I think IObservable does implement an "unfold" called generate...)

anyway first you need an unfold method, I think this works (unfortunately it will eventually blow the stack for large "lists"...you can write this safely in F# using yield! rather than concat);

    static IEnumerable<T> Unfold<T, U>(Func<U, IEnumerable<Tuple<U, T>>> f, U seed)
    {
        var maybeNewSeedAndElement = f(seed);

        return maybeNewSeedAndElement.SelectMany(x => new[] { x.Item2 }.Concat(Unfold(f, x.Item1)));
    }

this is a bit obtuse because C# doesn't implement some of the things functional langauges take for granted...but it basically takes a seed and then generates a "Maybe" answer of the next element in the IEnumerable and the next seed (Maybe doesn't exist in C#, so we've used IEnumerable to fake it), and concatenates the rest of the answer (I can't vouch for the "O(n?)" complexity of this).

Once you've done that then;

    static IEnumerable<IEnumerable<T>> Batch<T>(IEnumerable<T> xs, int n)
    {
        return Unfold(ys =>
            {
                var head = ys.Take(n);
                var tail = ys.Skip(n);
                return head.Take(1).Select(_ => Tuple.Create(tail, head));
            },
            xs);
    }

it all looks quite clean...you take the "n" elements as the "next" element in the IEnumerable, and the "tail" is the rest of the unprocessed list.

if there is nothing in the head...you're over...you return "Nothing" (but faked as an empty IEnumerable>)...else you return the head element and the tail to process.

you probably can do this using IObservable, there's probably a "Batch" like method already there, and you can probably use that.

If the risk of stack overflows worries (it probably should), then you should implement in F# (and there's probably some F# library (FSharpX?) already with this).

(I have only done some rudimentary tests of this, so there may be the odd bugs in there).

Upvotes: 2

frhack
frhack

Reputation: 5112

Another way is using Rx Buffer operator

//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;

var observableBatches = anAnumerable.ToObservable().Buffer(size);

var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();

Upvotes: 1

frhack
frhack

Reputation: 5112

Just another one line implementation. It works even with an empty list, in this case you get a zero size batches collection.

var aList = Enumerable.Range(1, 100).ToList(); //a given list
var size = 9; //the wanted batch size
//number of batches are: (aList.Count() + size - 1) / size;

var batches = Enumerable.Range(0, (aList.Count() + size - 1) / size).Select(i => aList.GetRange( i * size, Math.Min(size, aList.Count() - i * size)));

Assert.True(batches.Count() == 12);
Assert.AreEqual(batches.ToList().ElementAt(0), new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
Assert.AreEqual(batches.ToList().ElementAt(1), new List<int>() { 10, 11, 12, 13, 14, 15, 16, 17, 18 });
Assert.AreEqual(batches.ToList().ElementAt(11), new List<int>() { 100 });

Upvotes: 1

Johni Michels
Johni Michels

Reputation: 145

I know everybody used complex systems to do this work, and I really don't get it why. Take and skip will allow all those operations using the common select with Func<TSource,Int32,TResult> transform function. Like:

public IEnumerable<IEnumerable<T>> Buffer<T>(IEnumerable<T> source, int size)=>
    source.Select((item, index) => source.Skip(size * index).Take(size)).TakeWhile(bucket => bucket.Any());

Upvotes: 0

leat
leat

Reputation: 1467

I wrote a custom IEnumerable implementation that works without linq and guarantees a single enumeration over the data. It also accomplishes all this without requiring backing lists or arrays that cause memory explosions over large data sets.

Here are some basic tests:

    [Fact]
    public void ShouldPartition()
    {
        var ints = new List<int> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
        var data = ints.PartitionByMaxGroupSize(3);
        data.Count().Should().Be(4);

        data.Skip(0).First().Count().Should().Be(3);
        data.Skip(0).First().ToList()[0].Should().Be(0);
        data.Skip(0).First().ToList()[1].Should().Be(1);
        data.Skip(0).First().ToList()[2].Should().Be(2);

        data.Skip(1).First().Count().Should().Be(3);
        data.Skip(1).First().ToList()[0].Should().Be(3);
        data.Skip(1).First().ToList()[1].Should().Be(4);
        data.Skip(1).First().ToList()[2].Should().Be(5);

        data.Skip(2).First().Count().Should().Be(3);
        data.Skip(2).First().ToList()[0].Should().Be(6);
        data.Skip(2).First().ToList()[1].Should().Be(7);
        data.Skip(2).First().ToList()[2].Should().Be(8);

        data.Skip(3).First().Count().Should().Be(1);
        data.Skip(3).First().ToList()[0].Should().Be(9);
    }

The Extension Method to partition the data.

/// <summary>
/// A set of extension methods for <see cref="IEnumerable{T}"/>. 
/// </summary>
public static class EnumerableExtender
{
    /// <summary>
    /// Splits an enumerable into chucks, by a maximum group size.
    /// </summary>
    /// <param name="source">The source to split</param>
    /// <param name="maxSize">The maximum number of items per group.</param>
    /// <typeparam name="T">The type of item to split</typeparam>
    /// <returns>A list of lists of the original items.</returns>
    public static IEnumerable<IEnumerable<T>> PartitionByMaxGroupSize<T>(this IEnumerable<T> source, int maxSize)
    {
        return new SplittingEnumerable<T>(source, maxSize);
    }
}

This is the implementing class

    using System.Collections;
    using System.Collections.Generic;

    internal class SplittingEnumerable<T> : IEnumerable<IEnumerable<T>>
    {
        private readonly IEnumerable<T> backing;
        private readonly int maxSize;
        private bool hasCurrent;
        private T lastItem;

        public SplittingEnumerable(IEnumerable<T> backing, int maxSize)
        {
            this.backing = backing;
            this.maxSize = maxSize;
        }

        public IEnumerator<IEnumerable<T>> GetEnumerator()
        {
            return new Enumerator(this, this.backing.GetEnumerator());
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        private class Enumerator : IEnumerator<IEnumerable<T>>
        {
            private readonly SplittingEnumerable<T> parent;
            private readonly IEnumerator<T> backingEnumerator;
            private NextEnumerable current;

            public Enumerator(SplittingEnumerable<T> parent, IEnumerator<T> backingEnumerator)
            {
                this.parent = parent;
                this.backingEnumerator = backingEnumerator;
                this.parent.hasCurrent = this.backingEnumerator.MoveNext();
                if (this.parent.hasCurrent)
                {
                    this.parent.lastItem = this.backingEnumerator.Current;
                }
            }

            public bool MoveNext()
            {
                if (this.current == null)
                {
                    this.current = new NextEnumerable(this.parent, this.backingEnumerator);
                    return true;
                }
                else
                {
                    if (!this.current.IsComplete)
                    {
                        using (var enumerator = this.current.GetEnumerator())
                        {
                            while (enumerator.MoveNext())
                            {
                            }
                        }
                    }
                }

                if (!this.parent.hasCurrent)
                {
                    return false;
                }

                this.current = new NextEnumerable(this.parent, this.backingEnumerator);
                return true;
            }

            public void Reset()
            {
                throw new System.NotImplementedException();
            }

            public IEnumerable<T> Current
            {
                get { return this.current; }
            }

            object IEnumerator.Current
            {
                get { return this.Current; }
            }

            public void Dispose()
            {
            }
        }

        private class NextEnumerable : IEnumerable<T>
        {
            private readonly SplittingEnumerable<T> splitter;
            private readonly IEnumerator<T> backingEnumerator;
            private int currentSize;

            public NextEnumerable(SplittingEnumerable<T> splitter, IEnumerator<T> backingEnumerator)
            {
                this.splitter = splitter;
                this.backingEnumerator = backingEnumerator;
            }

            public bool IsComplete { get; private set; }

            public IEnumerator<T> GetEnumerator()
            {
                return new NextEnumerator(this.splitter, this, this.backingEnumerator);
            }

            IEnumerator IEnumerable.GetEnumerator()
            {
                return this.GetEnumerator();
            }

            private class NextEnumerator : IEnumerator<T>
            {
                private readonly SplittingEnumerable<T> splitter;
                private readonly NextEnumerable parent;
                private readonly IEnumerator<T> enumerator;
                private T currentItem;

                public NextEnumerator(SplittingEnumerable<T> splitter, NextEnumerable parent, IEnumerator<T> enumerator)
                {
                    this.splitter = splitter;
                    this.parent = parent;
                    this.enumerator = enumerator;
                }

                public bool MoveNext()
                {
                    this.parent.currentSize += 1;
                    this.currentItem = this.splitter.lastItem;
                    var hasCcurent = this.splitter.hasCurrent;

                    this.parent.IsComplete = this.parent.currentSize > this.splitter.maxSize;

                    if (this.parent.IsComplete)
                    {
                        return false;
                    }

                    if (hasCcurent)
                    {
                        var result = this.enumerator.MoveNext();

                        this.splitter.lastItem = this.enumerator.Current;
                        this.splitter.hasCurrent = result;
                    }

                    return hasCcurent;
                }

                public void Reset()
                {
                    throw new System.NotImplementedException();
                }

                public T Current
                {
                    get { return this.currentItem; }
                }

                object IEnumerator.Current
                {
                    get { return this.Current; }
                }

                public void Dispose()
                {
                }
            }
        }
    }

Upvotes: 1

Matthew Strawbridge
Matthew Strawbridge

Reputation: 20630

If you start with sequence defined as an IEnumerable<T>, and you know that it can safely be enumerated multiple times (e.g. because it is an array or a list), you can just use this simple pattern to process the elements in batches:

while (sequence.Any())
{
    var batch = sequence.Take(10);
    sequence = sequence.Skip(10);

    // do whatever you need to do with each batch here
}

Upvotes: 50

user4698855
user4698855

Reputation: 3037

Same approach as MoreLINQ, but using List instead of Array. I haven't done benchmarking, but readability matters more to some people:

    public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
    {
        List<T> batch = new List<T>();

        foreach (var item in source)
        {
            batch.Add(item);

            if (batch.Count >= size)
            {
                yield return batch;
                batch.Clear();
            }
        }

        if (batch.Count > 0)
        {
            yield return batch;
        }
    }

Upvotes: 3

Kaushik
Kaushik

Reputation: 2090

I'm joining this very late but i found something more interesting.

So we can use here Skip and Take for better performance.

public static class MyExtensions
    {
        public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
        {
            return items.Select((item, index) => new { item, index })
                        .GroupBy(x => x.index / maxItems)
                        .Select(g => g.Select(x => x.item));
        }

        public static IEnumerable<T> Batch2<T>(this IEnumerable<T> items, int skip, int take)
        {
            return items.Skip(skip).Take(take);
        }

    }

Next I checked with 100000 records. The looping only is taking more time in case of Batch

Code Of console application.

static void Main(string[] args)
{
    List<string> Ids = GetData("First");
    List<string> Ids2 = GetData("tsriF");

    Stopwatch FirstWatch = new Stopwatch();
    FirstWatch.Start();
    foreach (var batch in Ids2.Batch(5000))
    {
        // Console.WriteLine("Batch Ouput:= " + string.Join(",", batch));
    }
    FirstWatch.Stop();
    Console.WriteLine("Done Processing time taken:= "+ FirstWatch.Elapsed.ToString());


    Stopwatch Second = new Stopwatch();

    Second.Start();
    int Length = Ids2.Count;
    int StartIndex = 0;
    int BatchSize = 5000;
    while (Length > 0)
    {
        var SecBatch = Ids2.Batch2(StartIndex, BatchSize);
        // Console.WriteLine("Second Batch Ouput:= " + string.Join(",", SecBatch));
        Length = Length - BatchSize;
        StartIndex += BatchSize;
    }

    Second.Stop();
    Console.WriteLine("Done Processing time taken Second:= " + Second.Elapsed.ToString());
    Console.ReadKey();
}

static List<string> GetData(string name)
{
    List<string> Data = new List<string>();
    for (int i = 0; i < 100000; i++)
    {
        Data.Add(string.Format("{0} {1}", name, i.ToString()));
    }

    return Data;
}

Time taken Is like this.

First - 00:00:00.0708 , 00:00:00.0660

Second (Take and Skip One) - 00:00:00.0008, 00:00:00.0008

Upvotes: 1

nichom
nichom

Reputation: 51

    static IEnumerable<IEnumerable<T>> TakeBatch<T>(IEnumerable<T> ts,int batchSize)
    {
        return from @group in ts.Select((x, i) => new { x, i }).ToLookup(xi => xi.i / batchSize)
               select @group.Select(xi => xi.x);
    }

Upvotes: -3

Nick Whaley
Nick Whaley

Reputation: 2799

All of the above perform terribly with large batches or low memory space. Had to write my own that will pipeline (notice no item accumulation anywhere):

public static class BatchLinq {
    public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) {
        if (size <= 0)
            throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");

        using (IEnumerator<T> enumerator = source.GetEnumerator())
            while (enumerator.MoveNext())
                yield return TakeIEnumerator(enumerator, size);
    }

    private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
        int i = 0;
        do
            yield return source.Current;
        while (++i < size && source.MoveNext());
    }
}

Edit: Known issue with this approach is that each batch must be enumerated and enumerated fully before moving to the next batch. For example this doesn't work:

//Select first item of every 100 items
Batch(list, 100).Select(b => b.First())

Upvotes: 30

L.B
L.B

Reputation: 116168

public static class MyExtensions
{
    public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
                                                       int maxItems)
    {
        return items.Select((item, inx) => new { item, inx })
                    .GroupBy(x => x.inx / maxItems)
                    .Select(g => g.Select(x => x.item));
    }
}

and the usage would be:

List<int> list = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

foreach(var batch in list.Batch(3))
{
    Console.WriteLine(String.Join(",",batch));
}

OUTPUT:

0,1,2
3,4,5
6,7,8
9

Upvotes: 126

Related Questions