Reputation: 945
I'm looking for the quickest method to split an array into chunks of fixed size (last one can be smaller, of course). I've looked throughout the site and didn't find any performance-wise comparisons, so I wrote them, and here are the results:
Time in microseconds, mean/error/stddev
For
int[]
- 30.02 | 0.1002 | 0.0937For
IEnumerable<int>
- 76.67 | 0.2146 | 0.1902
Update: Version below (in @Markus' answer) is 139.5 | 0.6702 | 0.5597
Most popular here on SO and frequently recommended method to use LINQ's GroupBy
with index/chunkSize
is a no go - 267 microseconds is way too greater than either of aforementioned implementations.
Is there any faster way to split arrays?
P.S.
This is the code for Array
and IEnumerable<T>
:
/// <summary>
/// Splits <paramref name="source"/> into chunks of size not greater than <paramref name="chunkMaxSize"/>
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">Array to be split</param>
/// <param name="chunkMaxSize">Max size of chunk</param>
/// <returns><see cref="IEnumerable{T}"/> of <see cref="Array"/> of <typeparam name="T"/></returns>
public static IEnumerable<T[]> AsChunks<T>(this T[] source, int chunkMaxSize)
{
var pos = 0;
var sourceLength = source.Length;
do
{
var len = Math.Min(pos + chunkMaxSize, sourceLength) - pos;
if (len == 0)
{
yield break;;
}
var arr = new T[len];
Array.Copy(source, pos, arr, 0, len);
pos += len;
yield return arr;
} while (pos < sourceLength);
}
/// <summary>
/// Splits <paramref name="source"/> into chunks of size not greater than <paramref name="chunkMaxSize"/>
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source"><see cref="IEnumerable{T}"/> to be split</param>
/// <param name="chunkMaxSize">Max size of chunk</param>
/// <returns><see cref="IEnumerable{T}"/> of <see cref="Array"/> of <typeparam name="T"/></returns>
public static IEnumerable<T[]> AsChunks<T>(this IEnumerable<T> source, int chunkMaxSize)
{
var arr = new T[chunkMaxSize];
var pos = 0;
foreach (var item in source)
{
arr[pos++] = item;
if (pos == chunkMaxSize)
{
yield return arr;
arr = new T[chunkMaxSize];
pos = 0;
}
}
if (pos > 0)
{
Array.Resize(ref arr, pos);
yield return arr;
}
}
P.P.S Complete solution with BenchmarkDotNet tests is on GitHub.
Upvotes: 3
Views: 4711
Reputation: 45
I know this is very old topic, but it lacks solution using Memory<T>
and yield
.
Based on solution using ArraySegment
we can use
public static IEnumerable<Memory<T>> ChunkViaMemory(this T[] source, int size)
{
var chunks = source.Length / size;
for (int i = 0; i < chunks; i++)
{
yield return source.AsMemory(i * size, size);
}
var leftOver = source.Length % size;
if (leftOver > 0)
{
yield return source.AsMemory(chunks * size, leftOver);
}
}
I run benchmark for array of 50k items chucked into 100 items, for standard Chunk method and compare with ArraySegment
and Memory<T>
.
Results shows this approach is much faster then alternatives.
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
Chunk | 1,408.33 us | 22.359 us | 20.914 us | 97.6563 | 412098 B |
ChunkArraySegment | 113.89 us | 0.750 us | 0.626 us | - | 80 B |
ChunkMemory | 32.19 us | 0.439 us | 0.411 us | - | 72 B |
Upvotes: 3
Reputation: 71
public static List<someObject[]> splitArrayIntoSmallerArrays<someObject>(someObject[] arrayOfSomeObject, int chunkSize)
{
var output = new List<someObject[]>();
var newestArray = new List<someObject>();
for (int i = 0, loopTo = arrayOfSomeObject.Count() - 1; i <= loopTo; i++)
{
newestArray.Add(arrayOfSomeObject[i]);
if (newestArray.Count == chunkSize)
{
output.Add(newestArray.ToArray());
newestArray = new List<someObject>();
}
}
output.Add(newestArray.ToArray());
return output;
}
Upvotes: 0
Reputation: 1
You can use this few lines of code
int chunkSize = 4;
int skipIndex = 0;
string[] words = { "one", "two", "three", "four", "five", "six" };
string[][] chunckResult = new string[words.Length][];
for (int index = 0; index < words.Length; index++)
{
chunckResult[index] = words.Skip(skipIndex).Take(chunkSize).ToArray();
skipIndex += chunkSize;
if (skipIndex == words.Length)
{
break;
}
}
Upvotes: 0
Reputation: 32770
Not really sure how this stacks up (ArraySegment
), but give it a try. I've avoided using iterators but not really sure if that is really a gain here.
public static IEnumerable<T[]> AsChunks<T>(
this T[] source, int chunkMaxSize)
{
var chunks = source.Length / chunkMaxSize;
var leftOver = source.Length % chunkMaxSize;
var result = new List<T[]>(chunks + 1);
var offset = 0;
for (var i = 0; i < chunks; i++)
{
result.Add(new ArraySegment<T>(source,
offset,
chunkMaxSize).ToArray());
offset += chunkMaxSize;
}
if (leftOver > 0)
{
result.Add(new ArraySegment<T>(source,
offset,
leftOver).ToArray());
}
return result;
}
Upvotes: 10
Reputation: 2281
I used this Code:
Source: http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize) {
if (batchSize <= 0) {
throw new ArgumentOutOfRangeException(nameof(batchSize));
}
using(var enumerator = source.GetEnumerator()) {
while (enumerator.MoveNext()) {
yield return YieldBatchElements(enumerator, batchSize - 1);
}
}
}
private static IEnumerable<T> YieldBatchElements<T>(IEnumerator<T> source, int batchSize) {
yield return source.Current;
for (var i = 0; i < batchSize && source.MoveNext(); i++) {
yield return source.Current;
}
}
Upvotes: -1