DaveLooper
DaveLooper

Reputation: 21

optimize the comparison in two lists with LINQ

I have two lists of object:

Customer And Employee

I need to check if there is at least 1 Client with the same name as an employee.

Currently I have:

client.ForEach(a =>
            {
                if (employee.Any(m => m.Name == a.Name && m.FirstName==a.FirstName)
                {
                    // OK TRUE
                }
            });

can I improve reading by doing it in another way?

Upvotes: 1

Views: 228

Answers (2)

Vladi Pavelka
Vladi Pavelka

Reputation: 926

I would go either with Join

var isDuplicated = clients.Join(employees, 
       c => new { c.Name, c.FirstName }, 
       e => new { e.Name, e.FirstName },
       (c, e) => new { c, e })
       .Any();

or Intersect

var clientNames = clients.Select(c => new { c.Name, c.FirstName });
var employeeNames = employees.Select(e => new { e.Name, e.FirstName });
var isDuplicated = clientNames.Intersect(employeeNames).Any();

Both of Join and Intersect use hashing, and are close to O(n).

Note: equality (and hash code) of anonymous objects (new { , }) is evaluated as for a value type. I.e. two anonymous objects are equal (implies have same hash code) when all their fields are equal.

=== EDIT: Ok, I was interested myself (hope your question was about performance :P)

[TestMethod]
public void PerformanceTest()
{
    var random = new Random();
    var clients = Enumerable.Range(0, 10000)
            .Select(_ => new Person { FirstName = $"{random.Next()}",      
                                      LastName = $"{random.Next()}" })
            .ToList();

    var employees = Enumerable.Range(0, 10000)
            .Select(_ => new Person { FirstName = $"{random.Next()}", 
                                      LastName = $"{random.Next()}" })
            .ToList();

    var joinElapsedMs = MeasureAverageElapsedMs(() =>
        {
            var isDuplicated = clients.Join(employees,
                    c => new { c.FirstName, c.LastName },
                    e => new { e.FirstName, e.LastName },
                    (c, e) => new { c, e })
                    .Any();
        });

    var intersectElapsedMs = MeasureAverageElapsedMs(() =>
        {
            var clientNames = clients.Select(c => new { c.FirstName, c.LastName });
            var employeeNames = employees.Select(e => new { e.FirstName, e.LastName });
            var isDuplicated = clientNames.Intersect(employeeNames).Any();
        });

    var anyAnyElapsedMs = MeasureAverageElapsedMs(() =>
        {
            var isDuplicated = clients.Any(c => employees.Any(
                    e => c.FirstName == e.FirstName && c.LastName == e.LastName));
        });

    Console.WriteLine($"{nameof(joinElapsedMs)}: {joinElapsedMs}");
    Console.WriteLine($"{nameof(intersectElapsedMs)}: {intersectElapsedMs}");
    Console.WriteLine($"{nameof(anyAnyElapsedMs)}: {anyAnyElapsedMs}");
}

private static double MeasureAverageElapsedMs(Action action) =>
    Enumerable.Range(0, 10).Select(_ => MeasureElapsedMs(action)).Average();

private static long MeasureElapsedMs(Action action)
{
    var stopWatch = Stopwatch.StartNew();
    action();
    return stopWatch.ElapsedMilliseconds;
}

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

Output:

joinElapsedMs: 5.9
intersectElapsedMs: 3.5
anyAnyElapsedMs: 3185.8 

Note: any-any is O(n^2) - (in worst case) every employee is iterated per each iterated client.

Upvotes: 1

Barr J
Barr J

Reputation: 10927

why won't you check it before hand using join?

var mergedClients = Client.Join(listSFull,
                x => new { x.Name, x.FirstName},
                y => new { Name = y.Name, FirstName= y.FirstName},
                (x, y) => new { x, y }).ToList();

and then iterate over the new collection:

mergedClients.ForEach(a =>
//your logic

Only disadvantage of this approach (if it bothers you) is that null values will not be included.

Upvotes: 2

Related Questions