Reputation: 3539
I have a class Event
that have two properties : "ID", and "ExpirationTime".
I have a list that have many events, some of them with the same ID.
I want to create an efficient LINQ query that will distinct the events by the ID, and for each ID keep the event with the smallest ExpirationTime.
Thanks!
Upvotes: 3
Views: 1553
Reputation: 128417
I believe this should outperform the GroupBy
suggestion (see brief explanation below):
IEnumerable<Event> DistinctEvents(IEnumerable<Event> events)
{
var dict = new Dictionary<int, Event>();
foreach (Event e in events)
{
Event existing;
if (!dict.TryGetValue(e.Id, out existing) || e.ExpirationTime < existing.ExpirationTime)
{
dict[e.Id] = e;
}
}
foreach (Event e in dict.Values)
{
yield return e;
}
}
Explanation: While this and the GroupBy
method proposed by Ani have the same algorithmic complexity (as far as I can tell, anyway), the above approach is more efficient in practice for two reasons.
GroupBy
internally uses a Lookup<TKey, TValue>
(very similar to a Dictionary<TKey, List<TValue>>
) which actually populates internal collections with the contents of the input sequence. This requires more memory and also has a performance impact, particularly due to the fact that while the sub-collections will have amortized O(1) insertion time, they will occasionally need to resize themselves, which will be O(N) (where N is the size of the sub-collection). This is not a big deal, but it's still a lot more work you really need to be doing.GroupBy
can provide an enumerator (so it's deferred execution, but then the entire input sequence needs to be iterated before iterating over the result of GroupBy
). Then you're iterating over each group again in the call to Aggregate
; so in all, you're iterating over the elements in the input sequence twice, which is more times than necessary to accomplish the task at hand.As I said, the algorithmic complexity is the same, which means the two approaches should be equally scalable; this one is simply faster. I took the liberty of testing both approaches (out of curiosity, mostly) and found the above to execute in roughly half the time and cause fewer GC collections (a rough approximation of memory usage) than the GroupBy
approach.
These are minute concerns, which it would normally be a waste of time to think too much about. The only reason I mention them is that you asked for an efficient solution (and even bolded the term); so I figured you would want to take these kinds of factors into consideration.
Upvotes: 3
Reputation: 77606
I think this should do it:
events.GroupBy(x => x.ID, (key, items) => items.First(y => y.ExpirationTime == items.Min(z => z.ExpirationTime)))
Will group by ID, selecting as the result the event in items
(where items
represents all the events with the same ID) with the smallest ExpirationTime
.
Upvotes: 1
Reputation: 38152
Assuming you can implement IComparable on your Event
class (since LINQ's Min
doesn't have an overload returning the original item otherwise), you can do:
var distinct = events.GroupBy(evt => evt.Id).Select(grp => grp.Min());
Example:
void Main()
{
var events = new List<Event>
{
new Event(1, DateTime.Now),
new Event(1, DateTime.Now.AddDays(1)),
new Event(2, DateTime.Now.AddDays(2)),
new Event(2, DateTime.Now.AddDays(-22)),
};
var distinct = events.GroupBy(evt => evt.Id).Select(grp => grp.Min());
}
public class Event : IComparable<Event>
{
public Event(int id, DateTime exp)
{
Id = id;
Expiration = exp;
}
public int Id {get; set;}
public DateTime Expiration {get; set;}
public int CompareTo(Event other)
{
return Expiration.CompareTo(other.Expiration);
}
}
Upvotes: 2
Reputation: 113462
The grouping is easy enough, but doing an efficient "MinBy" with standard LINQ to Objects is slightly messy:
var lowestByID = items.GroupBy(x => x.ID)
.Select(group => group.Aggregate((best, next) =>
best.ExpirationTime < next.ExpirationTime
? best : next));
It's cleaner with a MinBy
operator, such as the one provided with MoreLinq.
var lowestByID = items.GroupBy(x => x.ID)
.Select(group => group.MinBy(x => x.ExpirationTime));
Upvotes: 4
Reputation: 60095
List<Event> events = null;
events
.GroupBy( e => e.ID )
.Select( g =>
g.First( e =>
e.ExpirationTime == g.Max( t =>
t.ExpirationTime
)
)
);
Upvotes: 0
Reputation: 110211
LINQ's Distinct() on a particular property
Simple! You want to group them and pick a winner out of the group.
List<Event> distinctEvents = allEvents
.GroupBy(e => e.Id)
.Select(g => g.OrderBy(e => e.ExpirationTime).First())
.ToList();
Upvotes: 3
Reputation: 18306
events.GroupBy(e => e.ID).Select(g => new { ID = g.Key, Time = g.Min(e => e.ExpirationTime) });
Upvotes: 1