Mario Tacke
Mario Tacke

Reputation: 5488

Linq/C#: How to split List into variable length chunks based on list item information?

I am trying to split up a list of type Record in Linq into sub lists based on certain Type information. There is always one record with type "a" before and one with type "b" after each group of records. I have a class Record:

class Record
{
    public string Type { get; set; }
    public string SomeOtherInformation { get; set; }
}

Here is a sample list (List<Record> records):

Type    SomeOtherInformation
a       ......
x       ......
x       ......
b       ......
a       ......
b       ......
a       ......
x       ......
x       ......
x       ......
x       ......
x       ......
b       ......

The desired output is (List<List<Record>> lists):

List #1:        List #2:        List #3:
a       ......  a       ......  a       ......
x       ......  b       ......  x       ......
x       ......                  x       ......
b       ......                  x       ......
                                x       ......
                                x       ......
                                b       ......

I am currently going through this list with a for loop and create a new list whenever the type is "a" and add it to the sub-list list when an item's type is "b". I am wondering if there is a better way to to this with Linq. Can this be done with Linq, if so, how?

Upvotes: 3

Views: 1332

Answers (4)

Jeff Mercado
Jeff Mercado

Reputation: 134881

As already mentioned, this is not a case that LINQ handles well because you can only really make decisions based on the current item, not what was previously seen. You need to maintains some kind of state to keep track of the groupings. Relying on side effects

Writing your own extension method would be the better option. You can keep state and make it all self contained (much like the existing operators such as GroupBy() and others). Here's an implementation I have that can optionally include items that are not contained within the start and end items.

public static IEnumerable<IImmutableList<TSource>> GroupByDelimited<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> startDelimiter,
    Func<TSource, bool> endDelimiter,
    bool includeUndelimited = false)
{
    var delimited = default(ImmutableList<TSource>.Builder);
    var undelimited = default(ImmutableList<TSource>.Builder);
    foreach (var item in source)
    {
        if (delimited == null)
        {
            if (startDelimiter(item))
            {
                if (includeUndelimited && undelimited != null)
                {
                    yield return undelimited.ToImmutable();
                    undelimited = null;
                }
                delimited = ImmutableList.CreateBuilder<TSource>();
            }
            else if (includeUndelimited)
            {
                if (undelimited == null)
                {
                    undelimited = ImmutableList.CreateBuilder<TSource>();
                }
                undelimited.Add(item);
            }
        }
        if (delimited != null)
        {
            delimited.Add(item);
            if (endDelimiter(item))
            {
                yield return delimited.ToImmutable();
                delimited = null;
            }
        }
    }
}

However, if you really wanted to, you can still do this using LINQ operators (Aggregate()) but it will not be a real LINQ solution. It will again, look like a self contained foreach loop.

var result = records.Aggregate(
    Tuple.Create(default(List<Record>), new List<List<Record>>()),
    (acc, record) =>
    {
        var grouping = acc.Item1;
        var result = acc.Item2;
        if (grouping == null && record.Type == "a")
        {
            grouping = new List<Record>();
        }
        if (grouping != null)
        {
            grouping.Add(record);
            if (record.Type == "b")
            {
                result.Add(grouping);
                grouping = null;
            }
        }
        return Tuple.Create(grouping, result);
    },
    acc => acc.Item2
);

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500785

You can't cleanly do this with normal LINQ, as far as I'm aware. The streaming operators within LINQ rely on you being able to make a decision about an item (e.g. whether or not to filter it, how to project it, how to group it) based on just that item, and possibly its index within the original source. In your case, you really need more information than that - you need to know how many b items you've already seen.

You could do it like this:

int bs = 0;
var groups = records.GroupBy(item => item.Type == 'b' ? bs++ : bs,
                             (key, group) => group.ToList())
                    .ToList();

However, that relies on the side-effect of b++ within the grouping projection (to keep track of how many b items we've already seen) - it's definitely not idiomatic LINQ, and I wouldn't recommend it.

Upvotes: 5

BradleyDotNET
BradleyDotNET

Reputation: 61349

Definitely not pure LINQ, but I could imagine using TakeWhile in a loop to do this:

List<Record> data;
List<List<Record>> result = new List<List<Record>>();

IEnumerable<Record> workingData = data;
while (workingData.Count() > 0)
{
    IEnumerable<Record> subList = workingData.Take(1).Concat(workingData.Skip(1).TakeWhile(c => c.Type != 'a'));
    result.Add(subList.ToList());
    workingData = workingData.Except(subList);
}

To explain, we get the 'a' we know is at the start of our sequence, then skip it and take until we encounter another 'a'. This makes up one of the sub records, so we add it to our result. Then we remove this subList from our "working" set, and enumerate again until we run out of elements.

I'm not sure this would be better than your existing solution, but hopefully it helps!

This actually does work, (tested on VS 2013, .NET 4.5.1) by using workingData instead of data in the loop (a typo on my part, fixed above). Except will use the default comparer for comparing the objects, since we don't override .Equals, it will compare the references (effectively the pointers). Thus, duplicate data is not a problem. If .Equals were overridden, you would need to ensure each record was unique.

If anyone would like to verify this, here is my test program (just put a breakpoint on the Console.ReadKey, you'll see result has the correct data):

class Program
{
    static void Main(string[] args)
    {
        List<Record> testData = new List<Record>()
        {
            new Record() { Type = 'a', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'b', Data="Data" },
            new Record() { Type = 'a', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'b', Data="Data" },
            new Record() { Type = 'a', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'x', Data="Data" },
            new Record() { Type = 'b', Data="Data" }
        };

        List<List<Record>> result = new List<List<Record>>();

        IEnumerable<Record> workingData = testData;
        while (workingData.Count() > 0)
        {
            IEnumerable<Record> subList = workingData.Take(1).Concat(workingData.Skip(1).TakeWhile(c => c.Type != 'a'));
            result.Add(subList.ToList());
            workingData = workingData.Except(subList);
        }

        Console.ReadKey();
    }
}

class Record
{
    public char Type;
    public String Data;
}

Upvotes: 1

Selman Gen&#231;
Selman Gen&#231;

Reputation: 101691

I would use an extension method for this instead:

public static IEnumerable<IEnumerable<TSource>> SplitItems<TSource>(
        this IEnumerable<TSource> source,
        Func<TSource, bool> startItem, 
        Func<TSource, bool> endItem)
{
     var tempList = new List<TSource>();
     int counter = 0;
     foreach (var item in source)
     {
         if (startItem(item) || endItem(item)) counter++;
         tempList.Add(item);
         if (counter%2 == 0)
         {
            yield return tempList;
            tempList = new List<TSource>();
         }
      }

}

Here is the usage:

var result = list.SplitItems(x => x.Type == "a", x => x.Type == "b").ToList();

This will return you a List<IEnumerable<Record>> with 3 items.Ofcourse the method assumes at least there is one start item in the beginning and the end item at the end.You may want to add some checks and improve it according to your requirements.

Upvotes: 2

Related Questions