ajbrun
ajbrun

Reputation: 333

Grouping lists into groups of X items per group

I'm having a problem knowing the best way to make a method to group a list of items into groups of (for example) no more than 3 items. I've created the method below, but without doing a ToList on the group before I return it, I have a problem with it if the list is enumerated multiple times.

The first time it's enumerated is correct, but any additional enumeration is thrown off because the two variables (i and groupKey) appear to be remembered between the iterations.

So the questions are:

Upvotes: 8

Views: 10191

Answers (5)

ErrCode
ErrCode

Reputation: 306

This is based on Selman's Select with index idea, but using ToLookup to combine both the GroupBy and Select together as one:

public static IEnumerable<IEnumerable<TSource>> GroupBy<TSource>
        (this IEnumerable<TSource> source, int itemsPerGroup)
{    
    return source.Select((x, idx) => new { x, idx })
            .ToLookup(q => q.idx / itemsPerGroup, q => q.x);
}

The main difference though is that ToLookup actually evaluates results immediately (as concisely explained here: https://stackoverflow.com/a/11969517/7270462), which may or may not be desired.

Upvotes: 0

Nicholas Carey
Nicholas Carey

Reputation: 74277

The problem with using GroupBy() is that unless it somehow has knowledge under the hood that the input is ordered by key value, it has to read the entire sequence and allocate everything to its bucket before it can emit a single group. That's overkill in this case, since the key is a function of its ordinal position within the sequence.

I like the source.Skip(m).Take(n) approach, but that makes assumptions that items in source can be directly addressed. If that's not true or Skip() and Take() have no knowledge of the underlying implementation, then the production of each group is going to be an O(n/2) operation on the average, as it repeatedly iterates over source to produce the group.

This makes the overall partitioning operation, potentially quite expensive.

  • IF producing a group is an O(n/2) operation on the average, and
  • Given a group size of s, the production of approximately n/s groups is required,

Then the total cost of the operation is something like O(n2/2s), right?

So, I would do something this, an O(n) operation (feel free to use an IGrouping implementation if you'd like):

public static IEnumerable<KeyValuePair<int,T[]>> Partition<T>( this IEnumerable<T> source , int partitionSize )
{
  if ( source        == null ) throw new ArgumentNullException("source") ;
  if ( partitionSize <  1    ) throw new ArgumentOutOfRangeException("partitionSize") ;

  int     i         = 0 ;
  List<T> partition = new List<T>( partitionSize ) ;

  foreach( T item in source )
  {
    partition.Add(item) ;
    if ( partition.Count == partitionSize )
    {
      yield return new KeyValuePair<int,T[]>( ++i , partition.ToArray() ) ;
      partition.Clear() ;
    }
  }

  // return the last partition if necessary
  if ( partition.Count > 0 )
  {
    yield return new Partition<int,T>( ++i , items.ToArray() ) ;
  }

}

Upvotes: 5

Travis J
Travis J

Reputation: 82287

.net Fiddle

Essentially you have an IEnumerable, and you want to group it into an IEnumerable of IGroupables which each contain the key as an index and the group as the values. Your version does seem to accomplish on the first pass, but I think that you can definitely stream line a little bit.

Using skip and take is the most desirable way to accomplish in my opinion, but the custom key for grouping is where there is an issue. There is a way around this which is to create your own class as a grouping template (seen in this answer: https://stackoverflow.com/a/5073144/1026459).

The end result is this:

public static class GroupExtension
{
    public static IEnumerable<IGrouping<int, T>> GroupAt<T>(this IEnumerable<T> source, int itemsPerGroup)
    {
        for(int i = 0; i < (int)Math.Ceiling( (double)source.Count() / itemsPerGroup ); i++)
        {
            var currentGroup = new Grouping<int,T>{ Key = i };
            currentGroup.AddRange(source.Skip(itemsPerGroup*i).Take(itemsPerGroup));
            yield return currentGroup;
        }
    }
    private class Grouping<TKey, TElement> : List<TElement>, IGrouping<TKey, TElement>
    {
        public TKey Key { get; set; }
    }
}

And here is the demo in the fiddle which consumes it on a simple string

public class Program
{
    public void Main(){
        foreach(var p in getLine().Select(s => s).GroupAt(3))
            Console.WriteLine(p.Aggregate("",(s,val) => s += val));
    }
    public string getLine(){ return "Hello World, how are you doing, this just some text to show how the grouping works"; }
}

edit

Alternatively as just an IEnumerable of IEnumerable

public static IEnumerable<IEnumerable<T>> GroupAt<T>(this IEnumerable<T> source, int itemsPerGroup)
{
    for(int i = 0; i < (int)Math.Ceiling( (double)source.Count() / itemsPerGroup ); i++)
        yield return source.Skip(itemsPerGroup*i).Take(itemsPerGroup);
}

Upvotes: 3

Anthony Chu
Anthony Chu

Reputation: 37520

Here's one way to do this using LINQ...

public static IEnumerable<IGrouping<int, TSource>> GroupBy<TSource>
    (this IEnumerable<TSource> source, int itemsPerGroup)
{
    return source.Zip(Enumerable.Range(0, source.Count()),
                      (s, r) => new { Group = r / itemsPerGroup, Item = s })
                 .GroupBy(i => i.Group, g => g.Item)
                 .ToList();
}

Live Demo

Upvotes: 18

Selman Gen&#231;
Selman Gen&#231;

Reputation: 101691

I think you are looking for something like this:

return source.Select((x, idx) => new { x, idx })
      .GroupBy(x => x.idx / itemsPerGroup)
      .Select(g => g.Select(a => a.x));

You need to change your return type as IEnumerable<IEnumerable<TSource>>

Upvotes: 11

Related Questions