ElektroStudios
ElektroStudios

Reputation: 20464

Split collection into n of parts is not giving the desired resulting secuences

I'm trying to split a collection into an specific number of parts, I've taken some help seeying solutions over StackOverflow: Split a collection into `n` parts with LINQ?

This is my VB.Net translation from @Hasan Khan solution:

''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
                                            ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))

    Dim i As Integer = 0

    Dim splits As IEnumerable(Of IEnumerable(Of T)) =
                 From item As T In col
                 Group item By item = Threading.Interlocked.Increment(i) Mod amount
                 Into Group
                 Select Group.AsEnumerable()

    Return splits


End Function

And this my VB.Net translation of @manu08 solution:

''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
                                            ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))

    Return col.Select(Function(item, index) New With {index, item}).
               GroupBy(Function(x) x.index Mod amount).
               Select(Function(x) x.Select(Function(y) y.item))

End Function

The problem is that both functions returns a wrong result.

Because if I split a collection like this:

Dim mainCol As IEnumerable(Of Integer) = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Dim splittedCols As IEnumerable(Of IEnumerable(Of Integer)) =
    SplitIntoParts(col:=mainCol, amount:=2)

Both functions gives this result:

1: { 1, 3, 5, 7, 9 }
2: { 2, 4, 6, 8, 10 }

Instead of these secuences:

1: { 1, 2, 3, 4, 5 } 
2: { 6, 7, 8, 9, 10 }

What I'm doing wrong?.

Upvotes: 1

Views: 184

Answers (3)

rtf_leg
rtf_leg

Reputation: 1819

MyExtensions class has two public Split methods:

  1. For ICollection - iterates through collection only once - for splitting.
  2. For IEnumerable - iterates through enumerable twice: for count items and for split them. Do not use this if possible (first one is safe and twice faster).

More of that: this algorithm is trying to bring back exactly specified number of collections.

public static class MyExtensions
{
    // Works with ICollection - iterates through collection only once.
    public static IEnumerable<IEnumerable<T>> Split<T>(this ICollection<T> items, int count)
    {
        return Split(items, items.Count, count);
    }

    // Works with IEnumerable and iterates items TWICE: first for count items, second to split them.
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int count)
    {            
        // ReSharper disable PossibleMultipleEnumeration
        var itemsCount = items.Count();
        return Split(items, itemsCount, count);
        // ReSharper restore PossibleMultipleEnumeration
    }

    private static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int itemsCount, int partsCount)
    {
        if (items == null)
            throw new ArgumentNullException("items");
        if (partsCount <= 0)
            throw new ArgumentOutOfRangeException("partsCount");

        var rem = itemsCount % partsCount;
        var min = itemsCount / partsCount;
        var max = rem != 0 ? min + 1 : min;

        var index = 0;
        var enumerator = items.GetEnumerator();

        while (index < itemsCount)
        {
            var size = 0 < rem-- ? max : min;
            yield return SplitPart(enumerator, size);
            index += size;
        }
    }

    private static IEnumerable<T> SplitPart<T>(IEnumerator<T> enumerator, int count)
    {
        for (var i = 0; i < count; i++)
        {
            if (!enumerator.MoveNext())
                break;
            yield return enumerator.Current;
        }            
    }
}

Example program:

var items = new [] {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};

for(var i = 1; i <= items.Length + 3; i++)
{
    Console.WriteLine("{0} part(s)", i);
    foreach (var part in items.Split(i))
        Console.WriteLine(string.Join(", ", part));
    Console.WriteLine();
}

... and output of this program:

1 part(s)
a, b, c, d, e, f, g, h, i, j

2 part(s)
a, b, c, d, e
f, g, h, i, j

3 part(s)
a, b, c, d
e, f, g
h, i, j

4 part(s)
a, b, c
d, e, f
g, h
i, j

5 part(s)
a, b
c, d
e, f
g, h
i, j

6 part(s)
a, b
c, d
e, f
g, h
i
j

7 part(s)
a, b
c, d
e, f
g
h
i
j

8 part(s)
a, b
c, d
e
f
g
h
i
j

9 part(s)
a, b
c
d
e
f
g
h
i
j

10 part(s)
a
b
c
d
e
f
g
h
i
j

11 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j

12 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j

13 part(s)  // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j

Upvotes: 3

Szer
Szer

Reputation: 3476

Inefficient solution (too many iterations over data):

class Program
{
    static void Main(string[] args)
    {
        var data = Enumerable.Range(1, 10);
        var result = data.Split(2);            
    }
}

static class Extensions
{
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> col, int amount)
    {
        var chunkSize = (int)Math.Ceiling((double)col.Count() / (double)amount);

        for (var i = 0; i < amount; ++i)
            yield return col.Skip(chunkSize * i).Take(chunkSize);
    }
}

EDIT:

In VB.Net

Public Shared Iterator Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
                                                     ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))

    Dim chunkSize As Integer = CInt(Math.Ceiling(CDbl(col.Count()) / CDbl(amount)))

    For i As Integer = 0 To amount - 1
        Yield col.Skip(chunkSize * i).Take(chunkSize)
    Next

End Function

Upvotes: 1

sloth
sloth

Reputation: 101072

You're not doing something wrong; it's just that the methods you use don't keep the ordering the way you want. Think about how mod and GroupBy work and you see why.

I suggest you use Jon Skeet's answer, since it keeps the ordering of your collection (I took the liberty to translate it for you into VB.Net).

You just have to calculate the size of each partition beforehand, since it does not split a collection into n chunks, but into chunks of a length of n:

<Extension> _
Public Shared Iterator Function Partition(Of T)(source As IEnumerable(Of T), size As Integer) As IEnumerable(Of IEnumerable(Of T)) 
    Dim array__1 As T() = Nothing
    Dim count As Integer = 0
    For Each item As T In source
        If array__1 Is Nothing Then
            array__1 = New T(size - 1) {}
        End If
        array__1(count) = item
        count += 1
        If count = size Then
            yield New ReadOnlyCollection(Of T)(array__1)
            array__1 = Nothing
            count = 0
        End If
    Next
    If array__1 IsNot Nothing Then
        Array.Resize(array__1, count)
        yield New ReadOnlyCollection(Of T)(array__1)
    End If
End Function

And to use it:

mainCol.Partition(CInt(Math.Ceiling(mainCol.Count() / 2)))

Feel free to hide the Partition(CInt(Math.Ceiling(...)) part in a new method.

Upvotes: 1

Related Questions