DDulla
DDulla

Reputation: 679

Finding duplicate records with LINQ with matching on multiple columns?

I'm trying to write a query to find all potential duplicate records for a user. A duplicate is defined as any of the following:

I have data coming from two separate systems, from system B the externalID on my table will be populated with a GUID, from system A the externalID will be null. I'm only concerned with finding duplicates between the two systems, not duplicates within each system.

The query i have thus far:

 Dim duplicates = db.People.Where(Function(c) c.CID = CID And c.DeleteFlag = False). _
        Where(Function(p) p.EmailAddress <> Nothing And p.ExternalID <> Nothing) _
       .GroupBy(Function(i) New With {i.EmailAddress, i.LastName, i.FirstName}) _
       .Where(Function(g) g.Count() > 1).[Select](Function(g) g.Key)

Issues I am having:

Upvotes: 0

Views: 890

Answers (1)

Tim Schmelter
Tim Schmelter

Reputation: 460208

Never use <> or = to compare with Nothing, you 'll always get False(similar as in sql). Nothing isn't equal or unequal to anything else. Instead use IsNot or Is Nothing. Also use AndAlso instead of And (and OrElse instead of Or).

In VB.NET (as opposed to C#) you also have to specify the key-columns which are used for Equals and GetHashCode of the anonymous type.

Dim duplicates = db.People.Where(Function(c) c.CID = CID And c.DeleteFlag = False). 
       Where(Function(p) p.EmailAddress IsNot Nothing AndAlso p.ExternalID IsNot Nothing).
       GroupBy(Function(i) New With {Key i.EmailAddress, Key i.LastName, Key i.FirstName}).
       Where(Function(g) g.Count() > 1).
       Select(Function(g) g.Key)

Upvotes: 2

Related Questions