Bug
Bug

Reputation: 942

LINQ Distinct on a particular property and latest

Suppose I have the following collection

public class User
{
    public string SSN { get; set; }
    public DateTime StartDate { get; set; }
}

var users = new List<User>
{
    new User {  SSN = "ab", StartDate = new DateTime(2021, 01, 01) },
    new User {  SSN = "ab", StartDate = new DateTime(2021, 01, 02) }, // take this

    new User {  SSN = "ac", StartDate = new DateTime(2021, 01, 01) },
    new User {  SSN = "ac", StartDate = new DateTime(2021, 02, 01) }, // take this

    new User {  SSN = "ad", StartDate = new DateTime(2020, 01, 01) },
    new User {  SSN = "ad", StartDate = new DateTime(2021, 01, 01) },
    new User {  SSN = "ad", StartDate = new DateTime(2022, 01, 01) }, // take this
};

What I am trying to do is to get SSN distinct but by only latest StartDate and I created two queries which seems to work. There is a better way in term of perfomance?

// shows only latest is selected
var district = users
    .OrderByDescending(p => p.StartDate)
    .GroupBy(g => g.SSN)
    .Select(x => x.First())
    .ToList();

var ssn = users
    .OrderByDescending(p => p.StartDate)
    .GroupBy(g => g.SSN)
    .Select(x => x.First())
    .Select(x=> x.SSN)
    .ToList();

Upvotes: 0

Views: 64

Answers (4)

Orace
Orace

Reputation: 8359

What I am trying to do is to get SSN distinct but by only latest StartDate:

You just need to do the operation in the good order.
First group by SSN.
Then get the latest element (by StartDate) in each group:

var result = users.GroupBy(u => u.SSN)                     // distinct
                  .Select(g => g.MaxBy(u => u.StartDate)); // latest

Upvotes: 1

Good Night Nerd Pride
Good Night Nerd Pride

Reputation: 8452

For better performance you should avoid OrderBy() like the accepted answer does.

But you can also avoid creating a new User instance with MaxBy():

var latest = users
    .GroupBy(u => u.SSN)
    .Select(us => us.MaxBy(y => y.StartDate));

Upvotes: 2

Ondrej Tucny
Ondrej Tucny

Reputation: 27962

If you group by SSN, and then select the same SSN, the values of StartDate are completely irrelevant.

Hence, the list of distinct SSNs can be obtained by selecting the Key of each grouping, like this:

var ssns = users.GroupBy(u => u.SSN).Select(g => g.Key);

Update: As per your further comments, if you need the whole user object, which has the maximum StartDate, you can do something like this:

var users = users
    .GroupBy(u => u.SSN)
    .Select(g => g.OrderByDescending(u => u.StartDate).First());

For another solution, see also Yong Shun's answer.

Upvotes: 0

Yong Shun
Yong Shun

Reputation: 51160

Since you mentioned that you also need the latest StartDate in the comment,

Group by SSN and get the latest StartDate via .Max().

var result = users
    .GroupBy(g => g.SSN)
    .Select(x => new User
    {
        SSN = x.Key,
        StartDate = x.Max(y => y.StartDate)
    })
    .ToList();

Sample .NET Fiddle

Upvotes: 3

Related Questions