Reputation: 333
I'm having a problem knowing the best way to make a method to group a list of items into groups of (for example) no more than 3 items. I've created the method below, but without doing a ToList
on the group before I return it, I have a problem with it if the list is enumerated multiple times.
The first time it's enumerated is correct, but any additional enumeration is thrown off because the two variables (i and groupKey) appear to be remembered between the iterations.
So the questions are:
Is simply ToListing the resulting group before it leaves this method really such a bad idea?
public static IEnumerable<IGrouping<int, TSource>> GroupBy<TSource>
(this IEnumerable<TSource> source, int itemsPerGroup)
{
const int initial = 1;
int i = initial;
int groupKey = 0;
var groups = source.GroupBy(x =>
{
if (i == initial)
{
groupKey = 0;
}
if (i > initial)
{
//Increase the group key if we've counted past the items per group
if (itemsPerGroup == initial || i % itemsPerGroup == 1)
{
groupKey++;
}
}
i++;
return groupKey;
});
return groups;
}
Upvotes: 8
Views: 10191
Reputation: 306
This is based on Selman's Select
with index idea, but using ToLookup
to combine both the GroupBy
and Select
together as one:
public static IEnumerable<IEnumerable<TSource>> GroupBy<TSource>
(this IEnumerable<TSource> source, int itemsPerGroup)
{
return source.Select((x, idx) => new { x, idx })
.ToLookup(q => q.idx / itemsPerGroup, q => q.x);
}
The main difference though is that ToLookup
actually evaluates results immediately (as concisely explained here: https://stackoverflow.com/a/11969517/7270462), which may or may not be desired.
Upvotes: 0
Reputation: 74277
The problem with using GroupBy()
is that unless it somehow has knowledge under the hood that the input is ordered by key value, it has to read the entire sequence and allocate everything to its bucket before it can emit a single group. That's overkill in this case, since the key is a function of its ordinal position within the sequence.
I like the source.Skip(m).Take(n)
approach, but that makes assumptions that items in source
can be directly addressed. If that's not true or Skip()
and Take()
have no knowledge of the underlying implementation, then the production of each group is going to be an O(n/2) operation on the average, as it repeatedly iterates over source
to produce the group.
This makes the overall partitioning operation, potentially quite expensive.
Then the total cost of the operation is something like O(n2/2s), right?
So, I would do something this, an O(n) operation (feel free to use an IGrouping
implementation if you'd like):
public static IEnumerable<KeyValuePair<int,T[]>> Partition<T>( this IEnumerable<T> source , int partitionSize )
{
if ( source == null ) throw new ArgumentNullException("source") ;
if ( partitionSize < 1 ) throw new ArgumentOutOfRangeException("partitionSize") ;
int i = 0 ;
List<T> partition = new List<T>( partitionSize ) ;
foreach( T item in source )
{
partition.Add(item) ;
if ( partition.Count == partitionSize )
{
yield return new KeyValuePair<int,T[]>( ++i , partition.ToArray() ) ;
partition.Clear() ;
}
}
// return the last partition if necessary
if ( partition.Count > 0 )
{
yield return new Partition<int,T>( ++i , items.ToArray() ) ;
}
}
Upvotes: 5
Reputation: 82287
Essentially you have an IEnumerable, and you want to group it into an IEnumerable of IGroupables which each contain the key as an index and the group as the values. Your version does seem to accomplish on the first pass, but I think that you can definitely stream line a little bit.
Using skip and take is the most desirable way to accomplish in my opinion, but the custom key for grouping is where there is an issue. There is a way around this which is to create your own class as a grouping template (seen in this answer: https://stackoverflow.com/a/5073144/1026459).
The end result is this:
public static class GroupExtension
{
public static IEnumerable<IGrouping<int, T>> GroupAt<T>(this IEnumerable<T> source, int itemsPerGroup)
{
for(int i = 0; i < (int)Math.Ceiling( (double)source.Count() / itemsPerGroup ); i++)
{
var currentGroup = new Grouping<int,T>{ Key = i };
currentGroup.AddRange(source.Skip(itemsPerGroup*i).Take(itemsPerGroup));
yield return currentGroup;
}
}
private class Grouping<TKey, TElement> : List<TElement>, IGrouping<TKey, TElement>
{
public TKey Key { get; set; }
}
}
And here is the demo in the fiddle which consumes it on a simple string
public class Program
{
public void Main(){
foreach(var p in getLine().Select(s => s).GroupAt(3))
Console.WriteLine(p.Aggregate("",(s,val) => s += val));
}
public string getLine(){ return "Hello World, how are you doing, this just some text to show how the grouping works"; }
}
edit
Alternatively as just an IEnumerable of IEnumerable
public static IEnumerable<IEnumerable<T>> GroupAt<T>(this IEnumerable<T> source, int itemsPerGroup)
{
for(int i = 0; i < (int)Math.Ceiling( (double)source.Count() / itemsPerGroup ); i++)
yield return source.Skip(itemsPerGroup*i).Take(itemsPerGroup);
}
Upvotes: 3
Reputation: 37520
Here's one way to do this using LINQ...
public static IEnumerable<IGrouping<int, TSource>> GroupBy<TSource>
(this IEnumerable<TSource> source, int itemsPerGroup)
{
return source.Zip(Enumerable.Range(0, source.Count()),
(s, r) => new { Group = r / itemsPerGroup, Item = s })
.GroupBy(i => i.Group, g => g.Item)
.ToList();
}
Upvotes: 18
Reputation: 101691
I think you are looking for something like this:
return source.Select((x, idx) => new { x, idx })
.GroupBy(x => x.idx / itemsPerGroup)
.Select(g => g.Select(a => a.x));
You need to change your return type as IEnumerable<IEnumerable<TSource>>
Upvotes: 11