CruelIO
CruelIO

Reputation: 18624

Best method to remove items from a list

I have a list of 500.000 to 1.000.000 instances of MyClass, which has these properties:

class MyClass
{
    string ParentId;
    string Name;
    DateTime StartDate;
    DateTime EndDate;
}

The data could look like this:

ParentId | Name    | StartDate    | EndDate
----------------------------------------------
parent1  | alpha   | 01-01-2011   | 02-02-2015
parent1  | beta    | 01-01-2011   | 02-02-2014
parent2  | gamma   | 01-01-2012   | 02-02-2011

I need to filter the list so it contains the "alpha" and "gamma" objects. The "beta" object should be excluded because it has the same parent as alpha, but an earlier EndDate.

I.e. the resulting list should only contain one instance per ParentId (the one with the latest EndDate).

The filtering needs to perform well.

Upvotes: 2

Views: 117

Answers (5)

M4N
M4N

Reputation: 96561

While the currently accepted answer (by @Kobi) is correct and is probably the simplest solution, it might not be the "best" solution.

Especially, since you mentioned that you might have quite a lot of items in the list and that the solution should perform well, I thought I'd check how a solution without LINQ performs.

This is my solution:

var tempDict = new Dictionary<string, MyClass>();
foreach (var data in list) // list is the List<MyClass>
{
    MyClass existing;
    if (!tempDict.TryGetValue(data.ParentId, out existing))
    {
        // Put item into temp dictionary (use ParentId as key)
        tempDict[data.ParentId] = data;
    }
    else
    {
        // Check if the instance in the temp dictionary has an
        // earlier EndDate. If yes, replace it.
        if (existing.EndDate < data.EndDate) // replace
            tempDict[data.ParentId] = data;
    }
}

var result = tempDict.Values.ToList();

A quick comparison (using 500.000 items) showed that this solution is about 3 to 4 times faster than the LINQ-version (depending on the number of unique ParentId values).

Upvotes: 2

Darkseal
Darkseal

Reputation: 9564

I assume you want to filter out beta for the reasons explained and not for its bare name. Here's what you can use to achieve such result:

myClasses.GroupBy(i => i.ParentId)
    .Select(i => i.OrderByDescending(i2 => i2.EndDate).First());

Upvotes: 2

Bondaryuk Vladimir
Bondaryuk Vladimir

Reputation: 519

You can use it, this method work fine and fast with large array:

var groupesList = yourList.GroupBy(x => x.ParentId,
     (y, set) => new {Key = y, Value = set.First(s => s.EndDate == set.Max(r => r.EndDate))}).Select(x => x.Value).ToList();

Upvotes: 0

Kobi
Kobi

Reputation: 138017

You can use GroupBy and Select:

var filtered = list
              .GroupBy(mc=>mc.ParentId)
              .Select(g=>g.OrderByDescending(mc=>mc.EndDate).First())
              .ToList();

Upvotes: 5

Dave Bish
Dave Bish

Reputation: 19646

You can easily filter a List<T> using Linq.Where

var result = myList
    .Where(item => item.Name == "gamma" || item.Name == "alfa")
    .ToList();

If you want to distinct the output on a certain feild, you can either use MoreLinq's DistinctBy

Or GroupBy:

var result = myList
    .Where(item => item.Name == "gamma" || item.Name == "alfa")
    .GroupBy(item => item.ParentId)
    .Select(g => g.First()) //Selection logic
    .ToList();

Upvotes: 0

Related Questions