Alex Seleznyov
Alex Seleznyov

Reputation: 945

Split array into chunks - is there faster method?

I'm looking for the quickest method to split an array into chunks of fixed size (last one can be smaller, of course). I've looked throughout the site and didn't find any performance-wise comparisons, so I wrote them, and here are the results:

Time in microseconds, mean/error/stddev

For int[] - 30.02 | 0.1002 | 0.0937

For IEnumerable<int> - 76.67 | 0.2146 | 0.1902

Update: Version below (in @Markus' answer) is 139.5 | 0.6702 | 0.5597

Most popular here on SO and frequently recommended method to use LINQ's GroupBy with index/chunkSize is a no go - 267 microseconds is way too greater than either of aforementioned implementations.

Is there any faster way to split arrays?

P.S. This is the code for Array and IEnumerable<T>:

    /// <summary>
    /// Splits <paramref name="source"/> into chunks of size not greater than <paramref name="chunkMaxSize"/>
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="source">Array to be split</param>
    /// <param name="chunkMaxSize">Max size of chunk</param>
    /// <returns><see cref="IEnumerable{T}"/> of <see cref="Array"/> of <typeparam name="T"/></returns>
    public static IEnumerable<T[]> AsChunks<T>(this T[] source, int chunkMaxSize)
    {
        var pos = 0;
        var sourceLength = source.Length;
        do
        {
            var len = Math.Min(pos + chunkMaxSize, sourceLength) - pos;
            if (len == 0)
            {
                yield break;;
            }
            var arr = new T[len];
            Array.Copy(source, pos, arr, 0, len);
            pos += len;
            yield return arr;
        } while (pos < sourceLength);
    }

    /// <summary>
    /// Splits <paramref name="source"/> into chunks of size not greater than <paramref name="chunkMaxSize"/>
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="source"><see cref="IEnumerable{T}"/> to be split</param>
    /// <param name="chunkMaxSize">Max size of chunk</param>
    /// <returns><see cref="IEnumerable{T}"/> of <see cref="Array"/> of <typeparam name="T"/></returns>
    public static IEnumerable<T[]> AsChunks<T>(this IEnumerable<T> source, int chunkMaxSize)
    {
        var arr = new T[chunkMaxSize];
        var pos = 0;
        foreach (var item in source)
        {
            arr[pos++] = item;
            if (pos == chunkMaxSize)
            {
                yield return arr;
                arr = new T[chunkMaxSize];
                pos = 0;
            }
        }
        if (pos > 0)
        {
            Array.Resize(ref arr, pos);
            yield return arr;
        }
    }

P.P.S Complete solution with BenchmarkDotNet tests is on GitHub.

Upvotes: 3

Views: 4711

Answers (5)

Arek Felinczak
Arek Felinczak

Reputation: 45

I know this is very old topic, but it lacks solution using Memory<T> and yield. Based on solution using ArraySegment we can use

public static IEnumerable<Memory<T>> ChunkViaMemory(this T[] source, int size)
{
    var chunks = source.Length / size;
    for (int i = 0; i < chunks; i++)
    {
        yield return source.AsMemory(i * size, size);
    }
    var leftOver = source.Length % size;
    if (leftOver > 0)
    {
        yield return source.AsMemory(chunks * size, leftOver);
    }
}

I run benchmark for array of 50k items chucked into 100 items, for standard Chunk method and compare with ArraySegment and Memory<T>.

Results shows this approach is much faster then alternatives.

Method Mean Error StdDev Gen0 Allocated
Chunk 1,408.33 us 22.359 us 20.914 us 97.6563 412098 B
ChunkArraySegment 113.89 us 0.750 us 0.626 us - 80 B
ChunkMemory 32.19 us 0.439 us 0.411 us - 72 B

Upvotes: 3

obfuscated
obfuscated

Reputation: 71

public static List<someObject[]> splitArrayIntoSmallerArrays<someObject>(someObject[] arrayOfSomeObject, int chunkSize)
{
    var output = new List<someObject[]>();
    var newestArray = new List<someObject>();
    for (int i = 0, loopTo = arrayOfSomeObject.Count() - 1; i <= loopTo; i++)
    {

        newestArray.Add(arrayOfSomeObject[i]);
        if (newestArray.Count == chunkSize)
        {
            output.Add(newestArray.ToArray());
            newestArray = new List<someObject>();
        }
    }
    output.Add(newestArray.ToArray());
    return output;
}

Upvotes: 0

Babaso Kale
Babaso Kale

Reputation: 1

You can use this few lines of code

                int chunkSize = 4;
                int skipIndex = 0;
                string[] words = { "one", "two", "three", "four", "five", "six" };
                string[][] chunckResult = new string[words.Length][];

                for (int index = 0; index < words.Length; index++)
                {
                    chunckResult[index] = words.Skip(skipIndex).Take(chunkSize).ToArray();
                    skipIndex += chunkSize;
                    if (skipIndex == words.Length)
                    {
                        break;
                    }
                }

Upvotes: 0

InBetween
InBetween

Reputation: 32770

Not really sure how this stacks up (ArraySegment), but give it a try. I've avoided using iterators but not really sure if that is really a gain here.

public static IEnumerable<T[]> AsChunks<T>(
    this T[] source, int chunkMaxSize)
{
    var chunks = source.Length / chunkMaxSize;
    var leftOver = source.Length % chunkMaxSize;
    var result = new List<T[]>(chunks + 1);
    var offset = 0;

    for (var i = 0; i < chunks; i++)
    {
        result.Add(new ArraySegment<T>(source,
                                       offset, 
                                       chunkMaxSize).ToArray());
        offset += chunkMaxSize;
    }

    if (leftOver > 0)
    {
        result.Add(new ArraySegment<T>(source, 
                                       offset, 
                                       leftOver).ToArray());
    }

    return result;
}

Upvotes: 10

Markus
Markus

Reputation: 2281

I used this Code:

Source: http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize) {
    if (batchSize <= 0) {
        throw new ArgumentOutOfRangeException(nameof(batchSize));
    }
    using(var enumerator = source.GetEnumerator()) {
        while (enumerator.MoveNext()) {
            yield return YieldBatchElements(enumerator, batchSize - 1);
        }
    }
}

private static IEnumerable<T> YieldBatchElements<T>(IEnumerator<T> source, int batchSize) {
    yield return source.Current;
    for (var i = 0; i < batchSize && source.MoveNext(); i++) {
        yield return source.Current;
    }
}

Upvotes: -1

Related Questions