Will I Am
Will I Am

Reputation: 2672

Comparing two lists for differences

I have the following situation:

class A
{
   public A(string name, int age) { Name = name; Age = age; }
   public string Name;
   public int Age;
}

List<A> one = 
  new List<A>() { new A("bob", 15), new A("john", 10), new A("mary", 12) };
List<A> two = 
  new List<A>() { new A("bob", 15), new A("mary", 15), new A("cindy", 18) }; 

I would like to do a diff between these lists, and get back the information that john is only in list 1, cindy is only in list 2, and mary is in both lists BUT it's not an exact match (age is different). My goal is to present this information in a side by side comparison.

Can someone suggest how to do this efficiently (as opposed of doing three passes, one for stuff that doesn't exist in first, another for stuff that doesn't exist in second, and a third for stuff that exists in both but is different)

I'm sorry if I missed any duplicate questions, the ones I was able to find only dealt with a boolean result, not the actual details.

Upvotes: 0

Views: 119

Answers (3)

Zein Makki
Zein Makki

Reputation: 30022

The passes would be implicit or "explicit". And by explicit i mean hidden through some Linq extension methods. So you can do the below:

var results = (from item in one.Concat(two).Select(x => x.Name).Distinct()
                let inFirst = one.Find(x => x.Name == item)
                let inSecond = two.Find(x => x.Name == item)
                let location = inFirst != null 
                            && inSecond != null 
                                ? 2 : inSecond != null ? 1 : 0
                select new
                {
                    Name = item,
                    location = location == 0 ? "First" : location == 1 ? "Second" : "Both",
                    ExactMatch = (location != 2 || inFirst.Age == inSecond.Age) 
                                  ? "YES" : $"One: { inFirst.Age } | Two: { inSecond.Age }"
                }).ToList();

Results:

{ Name = bob, location = Both, ExactMatch = YES }
{ Name = john, location = First, ExactMatch = YES }
{ Name = mary, location = Both, ExactMatch = One: 12 | Two: 15 }
{ Name = cindy, location = Second, ExactMatch = YES }

If you're worried about performance, use effiecient Data Structures for lookup O(1). The below finishes in 33ms for 10000 item lists, while the above finishes in around 5000ms:

var oneLookup = one.ToLookup(x => x.Name, x => x);
var twoLookup = two.ToLookup(x => x.Name, x => x);

var results = (from item in one.Concat(two).Select(x => x.Name).Distinct()
                let inFirst = oneLookup[item].FirstOrDefault()
                let inSecond = twoLookup[item].FirstOrDefault()
                let location = inFirst != null
                            && inSecond != null
                                ? 2 : inSecond != null ? 1 : 0
                select new
                {
                    Name = item,
                    location = location == 0 ? "First" : location == 1 ? "Second" : "Both",
                    ExactMatch = (location != 2 || inFirst.Age == inSecond.Age)
                                  ? "YES" : $"One: { inFirst.Age } | Two: { inSecond.Age }"
                }).ToList();

Upvotes: 1

Cameron MacFarland
Cameron MacFarland

Reputation: 71856

var result = 
    one.Select(a => Tuple.Create(a, "one")) // tag list one items
    .Concat(two.Select(a => Tuple.Create(a, "two"))) // tag list two items
    .GroupBy(t => t.Item1.Name) // group items by Name
    .ToList(); // cache result

var list_one_only = result.Where(g => g.Count() == 1 && g.First().Item2 == "one");
var list_two_only = result.Where(g => g.Count() == 1 && g.First().Item2 == "two");
var both_list_diff = result.Where(g => g.Count() == 2 && g.First().Item1.Age != g.Skip(1).First().Item1.Age);

This will return a list of lists, where each inner list will either be 1 item (the tuple with the original item and which list it's from) or there will be 2 items (the same name, possibly same age, and which age is from which list).

I wasn't sure what structure you wanted the results in exactly, so I left it there. Otherwise from here you could do another select to filter out same records in both lists ("bob"), etc.

This solution should only iterate through both lists once.

Upvotes: 2

Ivan Stoev
Ivan Stoev

Reputation: 205579

Can someone suggest how to do this efficiently (as opposed of doing three passes, one for stuff that doesn't exist in first, another for stuff that doesn't exist in second, and a third for stuff that exists in both but is different)

The only way to produce difference in one pass is if the two sequences are ordered by the identity key (Name in your case). However ordering introduces additional cost and also the process cannot be coded in LINQ.

What you really need is the full outer join, which has no natural LINQ support. So the classical approach requires two passes - left other join for stuff that exists in one and eventually in second, and right antijoin for stuff that exists only in second. This is so far the most efficient LINQ way.

The query could be like this:

var result =
    (from e1 in one
     join e2 in two on e1.Name equals e2.Name into match
     from e2 in match.DefaultIfEmpty()
     select new
     {
         e1.Name,
         In = e2 == null ? "A" : "A,B",
         Age = e2 == null || e1.Age == e2.Age ? e1.Age.ToString() : $"A:{e1.Age} B:{e2.Age}"
     })
    .Concat
    (from e2 in two
     join e1 in one on e2.Name equals e1.Name into match
     where !match.Any()
     select new { e2.Name, In = "B", Age = e2.Age.ToString() })
    .ToList();

which produces the following from your sample data:

{ Name = bob, In = A,B, Age = 15 }
{ Name = john, In = A, Age = 10 }
{ Name = mary, In = A,B, Age = A:12 B:15 }
{ Name = cindy, In = B, Age = 18 }

Of course you can output whatever you want. As you can see, the only place where you need to account for having two matching elements is the first part of the query.

Upvotes: 1

Related Questions