How to highlight rows that have duplicate values in 2 or more columns

Question

I am trying to figure out how to best do this. I have a sheet which has about 44 columns and around 64,000 rows. The columns have different customer data points such as name, date of birth, phone number, and e-mail (these are the most relevant columns for my purposes). I was wondering how I could sort by or highlight the rows in which at least three column data points match, to show a duplicate record for a customer. To explain clearly, I only want to highlight the rows that are duplicates based on at least 3 columns (the name column (the constant) and either phone number or DOB or e-mail.)

For example:

In the above, John Smith matched based on DOB alone. Lisa winters based on email, and Stephanie wright based on both DOB and email.

Now that I am looking at it more I will combine first and last name into one column so it will only have to match 2 or more columns instead of three.

I posted in superuser and all I got was countifs which seems like a start, but I seem to need to incorporate " and, or" logic as well?

Any help with specific formulas is greatly appreciated!

Tom Sharpe · Accepted Answer

Just for comparison, this would be the array-type approach but as @Luuklag rightly says, it could be slow with 64K rows of data, although it does give complete results

=SUMPRODUCT(($A2<>"")*($A2=$A$2:$A$10)*($B2=$B$2:B$10)*SIGN((($C2=$C$2:$C$10)+($D2=$D$2:$D$10)+($E2=$E$2:$E$10))))>1

So this tests all rows to see if there is more than one which agrees with the current row on last name, first name, and one of DOB, phone and email, assuming your data is in the first five columns and omitting any rows where last name is blank. Adjust ranges to suit.

This is too slow on 64K rows. A little better is to use SUMIFS

=(COUNTIFS($A$2:$A$64000,$A2,$B$2:$B$64000,$B2,$C$2:$C$64000,$C2)
+COUNTIFS($A$2:$A$64000,$A2,$B$2:$B$64000,$B2,$D$2:$D$64000,$D2)
+COUNTIFS($A$2:$A$64000,$A2,$B$2:$B$64000,$B2,$E$2:$E$64000,$E2))>3

How to highlight rows that have duplicate values in 2 or more columns

Answers (2)

Related Questions