user26411
user26411

Reputation: 121

How to efficiently select certain items from two lists using LINQ

I have these 3 classes:

Code:

public class Employee
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
    public string Gender { get; set; }
    public long TimeStamp { get; set; }
}
    
public class Student
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
    public long TimeStamp { get; set; }
}
    
public class Person<br>
{
    public string Name { get; set; }
    public int Age { get; set; }
}

I create 4 Lists :

var studentList = new List<Student>();// fill the List with a lot of Stundents
var employeeList = new List<Student>(); // fill the List with a lot of employees
var personList1 = new List<Person>();
var personList2 = new List<Person>();

Select all students and employees

var allStudents = studentList.Select(a => a); // does not make a lot of sence but for testing 
var allEmployee = employeeList.Select(b => b);

I want to map allStudents to

personList1.AddRange(allStudents.Select(a => new Person()
            {
               Age = a.Age,
               Name = a.Name
            } ));

I want to get all Employees where the value of TimeStape is not mentioned in the allStundent List

var allEmployeesWithDifferentTimeStampThanStundent =
    allEmployee.Where(a => !allStudents.Select(b =>b.TimeStamp).Contains(a.TimeStamp));

mapping again

personList2.AddRange(allEmployeesWithDifferentTimeStampThanStundent.Select
(a => new Person()
    {
    Age = a.Age,
    Name = a.Name
    } ));

merge both lists

personList1.AddRange(personList2);

Is there a better and more efficient way to do this?

Upvotes: 3

Views: 11098

Answers (2)

devgeezer
devgeezer

Reputation: 4189

The personList2 variable appears only to be there as an intermediate for projecting to the Person type -- if that's the case, you could skip its creation and use query syntax like so:

var personsFromNonMatchingEmployees =
    from employee in allEmployee
    join student in allStudents
    on employee.TimeStamp equals student.TimeStamp into studentsWithMatchingTimeStamp
    where !studentsWithMatchingTimeStamp.Any()
    select new Person { Age = employee.Age, Name = employee.Name };

personList1.AddRange(personsFromNonMatchingEmployees);

This is similar to the other GroupJoin approach since the compiler translates the above into a GroupJoin call. The use of join/group-join necessarily performs better than the Where..Contains approach since it makes use of hashing - in other words, it's an algorithmic Big-O improvement that should be quite noticeable for any more than a few Student or Employee instances.

By selecting the new Person object in the query, I'm able to bypass the personList2 list altogether. I find that I'm almost always able to eliminate temporary lists by doing selects like this that project to the type that I'm really interested in. I also left out the () on the new Person { .. } since the compiler doesn't require it.

Shy of changing up the inheritance and making Employee : Person & Student : Person, I don't think there's much more to improve.

Upvotes: 4

Lee
Lee

Reputation: 144206

You can use GroupJoin to find all employees without a matching Student record with the same timestamp:

var employeesDiffTS = allEmployee
    .GroupJoin(allStudents, e => e.TimeStamp, s => s.TimeStamp, (e, students) => new { Emp = e, HasMatch = students.Any() })
    .Where(em => !em.HasMatch)
    .Select(em => em.Emp)

personList2.AddRange(employeeDiffTS.Select(a => new Person { Age = a.Age, Name = a.Name }));

personList1.AddRange(personList2);

Upvotes: 2

Related Questions