Reputation: 3494
Can someone suggest a way to create batches of a certain size in LINQ?
Ideally I want to be able to perform operations in chunks of some configurable amount.
Upvotes: 175
Views: 120012
Reputation: 1301
This is a fully lazy, low overhead, one-function implementation of Batch that doesn't do any accumulation and instead forwards iteration steps directly to the source IEnumerable
, similar to python's itertools.GroupBy
.
This design eliminates copying and buffering, which is nice, but has the following consequences:
If these conditions are violated, that is, trying to access elements out of order, a second time, or via a saved iterator, .NET will throw an exception such as InvalidOperationException: Enumeration already finished.
. Note that the design and type, IEnumerable<IEnumerable<T>>
, naturally implies these conditions; make up your own mind whether that counts as "exception unsafe" or "misuse".
You can test a complete sample at .NET Fiddle.
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size <= 0)
throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
using (var enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
{
int i = 0;
// Batch is a local iterator function closing over `i` and
// `enumerator` that executes the inner batch enumeration
IEnumerable<T> Batch()
{
do yield return enumerator.Current;
while (++i < size && enumerator.MoveNext());
}
yield return Batch();
// discard skipped items
while (++i < size && enumerator.MoveNext());
}
}
// Buffer() explicitly buffers the contents of intermediate IEnumerables, lifting the conditions of enumerating in order etc.
// but loses the memory and overhead benefits of not copying/buffering.
public static IEnumerable<IReadOnlyList<T>> Buffer<T>(this IEnumerable<IEnumerable<T>> batched) => batched.Select(e => e.ToList());
This solution is based on (and fixes issues in) Nick Whaley's solution with help from EricRoller.
Upvotes: 37
Reputation: 18155
An Enumerable.Chunk()
extension method was added to .NET 6.0.
Example:
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
var chunks = list.Chunk(3);
// returns { { 1, 2, 3 }, { 4, 5, 6 }, { 7 } }
For those who cannot upgrade, the source is available on GitHub.
Upvotes: 156
Reputation: 8049
Here is an implementation that uses Async iteration in C# via IAsyncEnumerable
- https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/tutorials/generate-consume-asynchronous-stream
public static class EnumerableExtensions
{
/// <summary>
/// Chunks a sequence into a sub-sequences each containing maxItemsPerChunk, except for the last
/// which will contain any items left over.
///
/// NOTE: this implements a streaming implementation via <seealso cref="IAsyncEnumerable{T}"/>.
/// </summary>
public static async IAsyncEnumerable<IEnumerable<T>> ChunkAsync<T>(this IAsyncEnumerable<T> sequence, int maxItemsPerChunk)
{
if (sequence == null) throw new ArgumentNullException(nameof(sequence));
if (maxItemsPerChunk <= 0)
{
throw new ArgumentOutOfRangeException(nameof(maxItemsPerChunk), $"{nameof(maxItemsPerChunk)} must be greater than 0");
}
var chunk = new List<T>(maxItemsPerChunk);
await foreach (var item in sequence)
{
chunk.Add(item);
if (chunk.Count == maxItemsPerChunk)
{
yield return chunk.ToArray();
chunk.Clear();
}
}
// return the "crumbs" that
// didn't make it into a full chunk
if (chunk.Count > 0)
{
yield return chunk.ToArray();
}
}
/// <summary>
/// Chunks a sequence into a sub-sequences each containing maxItemsPerChunk, except for the last
/// which will contain any items left over.
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> sequence, int maxItemsPerChunk)
{
if (sequence == null) throw new ArgumentNullException(nameof(sequence));
if (maxItemsPerChunk <= 0)
{
throw new ArgumentOutOfRangeException(nameof(maxItemsPerChunk), $"{nameof(maxItemsPerChunk)} must be greater than 0");
}
var chunk = new List<T>(maxItemsPerChunk);
foreach (var item in sequence)
{
chunk.Add(item);
if (chunk.Count == maxItemsPerChunk)
{
yield return chunk.ToArray();
chunk.Clear();
}
}
// return the "crumbs" that
// didn't make it into a full chunk
if (chunk.Count > 0)
{
yield return chunk.ToArray();
}
}
}
Upvotes: 1
Reputation: 15032
Another way to perform batching:
public static class Extensions
{
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
yield return func(v0, v1);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
yield return func(v0, v1, v2);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
yield return func(v0, v1, v2, v3);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v14 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v14 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v15 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15);
}
}
}
}
Here's an example usage:
using System;
using System.Linq;
namespace TestProgram
{
class Program
{
static void Main(string[] args)
{
foreach (var item in Enumerable.Range(0, 12).ToArray().Batch((R, X1, Y1, X2, Y2) => (R, X1, Y1, X2, Y2)))
{
Console.WriteLine($"{item.R}, {item.X1}, {item.Y1}, {item.X2}, {item.Y2}");
}
}
}
}
Upvotes: 0
Reputation: 117124
Here's the cleanest version of Batch
that I can come up with:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int count)
{
if (source == null) throw new System.ArgumentNullException("source");
if (count <= 0) throw new System.ArgumentOutOfRangeException("count");
using (var enumerator = source.GetEnumerator())
{
IEnumerable<T> BatchInner()
{
int counter = 0;
do
yield return enumerator.Current;
while (++counter < count && enumerator.MoveNext());
}
while (enumerator.MoveNext())
yield return BatchInner().ToArray();
}
}
Using this code:
Console.WriteLine(String.Join(Environment.NewLine,
Enumerable.Range(0, 20).Batch(8).Select(xs => String.Join(",", xs))));
I get:
0,1,2,3,4,5,6,7
8,9,10,11,12,13,14,15
16,17,18,19
It's important to note that on the answers from "" & "" that this code fails:
var e = Enumerable.Range(0, 20).Batch(8).ToArray();
Console.WriteLine(String.Join(Environment.NewLine, e.Select(xs => String.Join(",", xs))));
Console.WriteLine();
Console.WriteLine(String.Join(Environment.NewLine, e.Select(xs => String.Join(",", xs))));
On their answers it gives:
19
19
19
19
19
19
Due to the inner enumerable not being computed as an array.
Upvotes: 3
Reputation: 4829
As a new helper method for LINQ in .NET 6 you can chunk any IEnumerable into batches:
int chunkNumber = 1;
foreach (int[] chunk in Enumerable.Range(0, 9).Chunk(3))
{
Console.WriteLine($"Chunk {chunkNumber++}");
foreach (var item in chunk)
{
Console.WriteLine(item);
}
}
Upvotes: 1
Reputation: 11
An easy version to use and understand.
public static List<List<T>> chunkList<T>(List<T> listToChunk, int batchSize)
{
List<List<T>> batches = new List<List<T>>();
if (listToChunk.Count == 0) return batches;
bool moreRecords = true;
int fromRecord = 0;
int countRange = 0;
if (listToChunk.Count >= batchSize)
{
countRange = batchSize;
}
else
{
countRange = listToChunk.Count;
}
while (moreRecords)
{
List<T> batch = listToChunk.GetRange(fromRecord, countRange);
batches.Add(batch);
if ((fromRecord + batchSize) >= listToChunk.Count)
{
moreRecords = false;
}
fromRecord = fromRecord + batch.Count;
if ((fromRecord + batchSize) > listToChunk.Count)
{
countRange = listToChunk.Count - fromRecord;
}
else
{
countRange = batchSize;
}
}
return batches;
}
Upvotes: 1
Reputation: 23732
I wonder why nobody has ever posted an old school for-loop solution. Here is one:
List<int> source = Enumerable.Range(1,23).ToList();
int batchsize = 10;
for (int i = 0; i < source.Count; i+= batchsize)
{
var batch = source.Skip(i).Take(batchsize);
}
This simplicity is possible because the Take method:
... enumerates
source
and yields elements untilcount
elements have been yielded orsource
contains no more elements. Ifcount
exceeds the number of elements insource
, all elements ofsource
are returned
Disclaimer:
Using Skip and Take inside the loop means that the enumerable will be enumerated multiple times. This is dangerous if the enumerable is deferred. It may result in multiple executions of a database query, or a web request, or a file read. This example is explicitly for the usage of a List which is not deferred, so it is less of a problem. It is still a slow solution since skip will enumerate the collection each time it is called.
This can also be solved using the GetRange
method, but it requires an extra calculation to extract a possible rest batch:
for (int i = 0; i < source.Count; i += batchsize)
{
int remaining = source.Count - i;
var batch = remaining > batchsize ? source.GetRange(i, batchsize) : source.GetRange(i, remaining);
}
Here is a third way to handle this, which works with 2 loops. This ensures that the collection is enumerated only 1 time!:
int batchsize = 10;
List<int> batch = new List<int>(batchsize);
for (int i = 0; i < source.Count; i += batchsize)
{
// calculated the remaining items to avoid an OutOfRangeException
batchsize = source.Count - i > batchsize ? batchsize : source.Count - i;
for (int j = i; j < i + batchsize; j++)
{
batch.Add(source[j]);
}
batch.Clear();
}
Upvotes: 14
Reputation: 236268
You don't need to write any code. Use MoreLINQ Batch method, which batches the source sequence into sized buckets (MoreLINQ is available as a NuGet package you can install):
int size = 10;
var batches = sequence.Batch(size);
Which is implemented as:
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IEnumerable<TSource> source, int size)
{
TSource[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
bucket = new TSource[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket;
bucket = null;
count = 0;
}
if (bucket != null && count > 0)
yield return bucket.Take(count).ToArray();
}
Upvotes: 151
Reputation: 43748
Here is an attempted improvement of Nick Whaley's (link) and infogulch's (link) lazy Batch
implementations. This one is strict. You either enumerate the batches in the correct order, or you get an exception.
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IEnumerable<TSource> source, int size)
{
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
using (var enumerator = source.GetEnumerator())
{
int i = 0;
while (enumerator.MoveNext())
{
if (i % size != 0) throw new InvalidOperationException(
"The enumeration is out of order.");
i++;
yield return GetBatch();
}
IEnumerable<TSource> GetBatch()
{
while (true)
{
yield return enumerator.Current;
if (i % size == 0 || !enumerator.MoveNext()) break;
i++;
}
}
}
}
And here is a lazy Batch
implementation for sources of type IList<T>
. This one imposes no restrictions on the enumeration. The batches can be enumerated partially, in any order, and more than once. The restriction of not modifying the collection during the enumeration is still in place though. This is achieved by making a dummy call to enumerator.MoveNext()
before yielding any chunk or element. The downside is that the enumerator is left undisposed, since it is unknown when the enumeration is going to finish.
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IList<TSource> source, int size)
{
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
var enumerator = source.GetEnumerator();
for (int i = 0; i < source.Count; i += size)
{
enumerator.MoveNext();
yield return GetChunk(i, Math.Min(i + size, source.Count));
}
IEnumerable<TSource> GetChunk(int from, int toExclusive)
{
for (int j = from; j < toExclusive; j++)
{
enumerator.MoveNext();
yield return source[j];
}
}
}
Upvotes: 4
Reputation: 2797
So with a functional hat on, this appears trivial....but in C#, there are some significant downsides.
you'd probably view this as an unfold of IEnumerable (google it and you'll probably end up in some Haskell docs, but there may be some F# stuff using unfold, if you know F#, squint at the Haskell docs and it will make sense).
Unfold is related to fold ("aggregate") except rather than iterating through the input IEnumerable, it iterates through the output data structures (its a similar relationship between IEnumerable and IObservable, in fact I think IObservable does implement an "unfold" called generate...)
anyway first you need an unfold method, I think this works (unfortunately it will eventually blow the stack for large "lists"...you can write this safely in F# using yield! rather than concat);
static IEnumerable<T> Unfold<T, U>(Func<U, IEnumerable<Tuple<U, T>>> f, U seed)
{
var maybeNewSeedAndElement = f(seed);
return maybeNewSeedAndElement.SelectMany(x => new[] { x.Item2 }.Concat(Unfold(f, x.Item1)));
}
this is a bit obtuse because C# doesn't implement some of the things functional langauges take for granted...but it basically takes a seed and then generates a "Maybe" answer of the next element in the IEnumerable and the next seed (Maybe doesn't exist in C#, so we've used IEnumerable to fake it), and concatenates the rest of the answer (I can't vouch for the "O(n?)" complexity of this).
Once you've done that then;
static IEnumerable<IEnumerable<T>> Batch<T>(IEnumerable<T> xs, int n)
{
return Unfold(ys =>
{
var head = ys.Take(n);
var tail = ys.Skip(n);
return head.Take(1).Select(_ => Tuple.Create(tail, head));
},
xs);
}
it all looks quite clean...you take the "n" elements as the "next" element in the IEnumerable, and the "tail" is the rest of the unprocessed list.
if there is nothing in the head...you're over...you return "Nothing" (but faked as an empty IEnumerable>)...else you return the head element and the tail to process.
you probably can do this using IObservable, there's probably a "Batch" like method already there, and you can probably use that.
If the risk of stack overflows worries (it probably should), then you should implement in F# (and there's probably some F# library (FSharpX?) already with this).
(I have only done some rudimentary tests of this, so there may be the odd bugs in there).
Upvotes: 2
Reputation: 5112
Another way is using Rx Buffer operator
//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;
var observableBatches = anAnumerable.ToObservable().Buffer(size);
var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();
Upvotes: 1
Reputation: 5112
Just another one line implementation. It works even with an empty list, in this case you get a zero size batches collection.
var aList = Enumerable.Range(1, 100).ToList(); //a given list
var size = 9; //the wanted batch size
//number of batches are: (aList.Count() + size - 1) / size;
var batches = Enumerable.Range(0, (aList.Count() + size - 1) / size).Select(i => aList.GetRange( i * size, Math.Min(size, aList.Count() - i * size)));
Assert.True(batches.Count() == 12);
Assert.AreEqual(batches.ToList().ElementAt(0), new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
Assert.AreEqual(batches.ToList().ElementAt(1), new List<int>() { 10, 11, 12, 13, 14, 15, 16, 17, 18 });
Assert.AreEqual(batches.ToList().ElementAt(11), new List<int>() { 100 });
Upvotes: 1
Reputation: 145
I know everybody used complex systems to do this work, and I really don't get it why.
Take and skip will allow all those operations using the common select with Func<TSource,Int32,TResult>
transform function.
Like:
public IEnumerable<IEnumerable<T>> Buffer<T>(IEnumerable<T> source, int size)=>
source.Select((item, index) => source.Skip(size * index).Take(size)).TakeWhile(bucket => bucket.Any());
Upvotes: 0
Reputation: 1467
I wrote a custom IEnumerable implementation that works without linq and guarantees a single enumeration over the data. It also accomplishes all this without requiring backing lists or arrays that cause memory explosions over large data sets.
Here are some basic tests:
[Fact]
public void ShouldPartition()
{
var ints = new List<int> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
var data = ints.PartitionByMaxGroupSize(3);
data.Count().Should().Be(4);
data.Skip(0).First().Count().Should().Be(3);
data.Skip(0).First().ToList()[0].Should().Be(0);
data.Skip(0).First().ToList()[1].Should().Be(1);
data.Skip(0).First().ToList()[2].Should().Be(2);
data.Skip(1).First().Count().Should().Be(3);
data.Skip(1).First().ToList()[0].Should().Be(3);
data.Skip(1).First().ToList()[1].Should().Be(4);
data.Skip(1).First().ToList()[2].Should().Be(5);
data.Skip(2).First().Count().Should().Be(3);
data.Skip(2).First().ToList()[0].Should().Be(6);
data.Skip(2).First().ToList()[1].Should().Be(7);
data.Skip(2).First().ToList()[2].Should().Be(8);
data.Skip(3).First().Count().Should().Be(1);
data.Skip(3).First().ToList()[0].Should().Be(9);
}
The Extension Method to partition the data.
/// <summary>
/// A set of extension methods for <see cref="IEnumerable{T}"/>.
/// </summary>
public static class EnumerableExtender
{
/// <summary>
/// Splits an enumerable into chucks, by a maximum group size.
/// </summary>
/// <param name="source">The source to split</param>
/// <param name="maxSize">The maximum number of items per group.</param>
/// <typeparam name="T">The type of item to split</typeparam>
/// <returns>A list of lists of the original items.</returns>
public static IEnumerable<IEnumerable<T>> PartitionByMaxGroupSize<T>(this IEnumerable<T> source, int maxSize)
{
return new SplittingEnumerable<T>(source, maxSize);
}
}
This is the implementing class
using System.Collections;
using System.Collections.Generic;
internal class SplittingEnumerable<T> : IEnumerable<IEnumerable<T>>
{
private readonly IEnumerable<T> backing;
private readonly int maxSize;
private bool hasCurrent;
private T lastItem;
public SplittingEnumerable(IEnumerable<T> backing, int maxSize)
{
this.backing = backing;
this.maxSize = maxSize;
}
public IEnumerator<IEnumerable<T>> GetEnumerator()
{
return new Enumerator(this, this.backing.GetEnumerator());
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
private class Enumerator : IEnumerator<IEnumerable<T>>
{
private readonly SplittingEnumerable<T> parent;
private readonly IEnumerator<T> backingEnumerator;
private NextEnumerable current;
public Enumerator(SplittingEnumerable<T> parent, IEnumerator<T> backingEnumerator)
{
this.parent = parent;
this.backingEnumerator = backingEnumerator;
this.parent.hasCurrent = this.backingEnumerator.MoveNext();
if (this.parent.hasCurrent)
{
this.parent.lastItem = this.backingEnumerator.Current;
}
}
public bool MoveNext()
{
if (this.current == null)
{
this.current = new NextEnumerable(this.parent, this.backingEnumerator);
return true;
}
else
{
if (!this.current.IsComplete)
{
using (var enumerator = this.current.GetEnumerator())
{
while (enumerator.MoveNext())
{
}
}
}
}
if (!this.parent.hasCurrent)
{
return false;
}
this.current = new NextEnumerable(this.parent, this.backingEnumerator);
return true;
}
public void Reset()
{
throw new System.NotImplementedException();
}
public IEnumerable<T> Current
{
get { return this.current; }
}
object IEnumerator.Current
{
get { return this.Current; }
}
public void Dispose()
{
}
}
private class NextEnumerable : IEnumerable<T>
{
private readonly SplittingEnumerable<T> splitter;
private readonly IEnumerator<T> backingEnumerator;
private int currentSize;
public NextEnumerable(SplittingEnumerable<T> splitter, IEnumerator<T> backingEnumerator)
{
this.splitter = splitter;
this.backingEnumerator = backingEnumerator;
}
public bool IsComplete { get; private set; }
public IEnumerator<T> GetEnumerator()
{
return new NextEnumerator(this.splitter, this, this.backingEnumerator);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
private class NextEnumerator : IEnumerator<T>
{
private readonly SplittingEnumerable<T> splitter;
private readonly NextEnumerable parent;
private readonly IEnumerator<T> enumerator;
private T currentItem;
public NextEnumerator(SplittingEnumerable<T> splitter, NextEnumerable parent, IEnumerator<T> enumerator)
{
this.splitter = splitter;
this.parent = parent;
this.enumerator = enumerator;
}
public bool MoveNext()
{
this.parent.currentSize += 1;
this.currentItem = this.splitter.lastItem;
var hasCcurent = this.splitter.hasCurrent;
this.parent.IsComplete = this.parent.currentSize > this.splitter.maxSize;
if (this.parent.IsComplete)
{
return false;
}
if (hasCcurent)
{
var result = this.enumerator.MoveNext();
this.splitter.lastItem = this.enumerator.Current;
this.splitter.hasCurrent = result;
}
return hasCcurent;
}
public void Reset()
{
throw new System.NotImplementedException();
}
public T Current
{
get { return this.currentItem; }
}
object IEnumerator.Current
{
get { return this.Current; }
}
public void Dispose()
{
}
}
}
}
Upvotes: 1
Reputation: 20630
If you start with sequence
defined as an IEnumerable<T>
, and you know that it can safely be enumerated multiple times (e.g. because it is an array or a list), you can just use this simple pattern to process the elements in batches:
while (sequence.Any())
{
var batch = sequence.Take(10);
sequence = sequence.Skip(10);
// do whatever you need to do with each batch here
}
Upvotes: 50
Reputation: 3037
Same approach as MoreLINQ, but using List instead of Array. I haven't done benchmarking, but readability matters more to some people:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
List<T> batch = new List<T>();
foreach (var item in source)
{
batch.Add(item);
if (batch.Count >= size)
{
yield return batch;
batch.Clear();
}
}
if (batch.Count > 0)
{
yield return batch;
}
}
Upvotes: 3
Reputation: 2090
I'm joining this very late but i found something more interesting.
So we can use here Skip
and Take
for better performance.
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
{
return items.Select((item, index) => new { item, index })
.GroupBy(x => x.index / maxItems)
.Select(g => g.Select(x => x.item));
}
public static IEnumerable<T> Batch2<T>(this IEnumerable<T> items, int skip, int take)
{
return items.Skip(skip).Take(take);
}
}
Next I checked with 100000 records. The looping only is taking more time in case of Batch
Code Of console application.
static void Main(string[] args)
{
List<string> Ids = GetData("First");
List<string> Ids2 = GetData("tsriF");
Stopwatch FirstWatch = new Stopwatch();
FirstWatch.Start();
foreach (var batch in Ids2.Batch(5000))
{
// Console.WriteLine("Batch Ouput:= " + string.Join(",", batch));
}
FirstWatch.Stop();
Console.WriteLine("Done Processing time taken:= "+ FirstWatch.Elapsed.ToString());
Stopwatch Second = new Stopwatch();
Second.Start();
int Length = Ids2.Count;
int StartIndex = 0;
int BatchSize = 5000;
while (Length > 0)
{
var SecBatch = Ids2.Batch2(StartIndex, BatchSize);
// Console.WriteLine("Second Batch Ouput:= " + string.Join(",", SecBatch));
Length = Length - BatchSize;
StartIndex += BatchSize;
}
Second.Stop();
Console.WriteLine("Done Processing time taken Second:= " + Second.Elapsed.ToString());
Console.ReadKey();
}
static List<string> GetData(string name)
{
List<string> Data = new List<string>();
for (int i = 0; i < 100000; i++)
{
Data.Add(string.Format("{0} {1}", name, i.ToString()));
}
return Data;
}
Time taken Is like this.
First - 00:00:00.0708 , 00:00:00.0660
Second (Take and Skip One) - 00:00:00.0008, 00:00:00.0008
Upvotes: 1
Reputation: 51
static IEnumerable<IEnumerable<T>> TakeBatch<T>(IEnumerable<T> ts,int batchSize)
{
return from @group in ts.Select((x, i) => new { x, i }).ToLookup(xi => xi.i / batchSize)
select @group.Select(xi => xi.x);
}
Upvotes: -3
Reputation: 2799
All of the above perform terribly with large batches or low memory space. Had to write my own that will pipeline (notice no item accumulation anywhere):
public static class BatchLinq {
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) {
if (size <= 0)
throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
using (IEnumerator<T> enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return TakeIEnumerator(enumerator, size);
}
private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
int i = 0;
do
yield return source.Current;
while (++i < size && source.MoveNext());
}
}
Edit: Known issue with this approach is that each batch must be enumerated and enumerated fully before moving to the next batch. For example this doesn't work:
//Select first item of every 100 items
Batch(list, 100).Select(b => b.First())
Upvotes: 30
Reputation: 116168
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
int maxItems)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / maxItems)
.Select(g => g.Select(x => x.item));
}
}
and the usage would be:
List<int> list = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
foreach(var batch in list.Batch(3))
{
Console.WriteLine(String.Join(",",batch));
}
OUTPUT:
0,1,2
3,4,5
6,7,8
9
Upvotes: 126