eqiz
eqiz

Reputation: 1591

SQL - Advance Comparison between 2 tables using "IS LIKE"

Ok this is going to get a bit chaotic so please try to stay with me..

I got a table of information kind of like this...

Table Name: Customers
_____________________________
ID   | CompanyName | FirstName | LastName  | Phone
-------------------------------
1    | Joes        | Joe       | James     | 1233334444
2    | Kennys      | Kenny     | Johnson   | 2222334555
3    | Kellys      | Kelly     | Gibson    | 5454445445
4    | Ricks #1    | Rick      | Lawson    | 4545334222
5    | Johns #1    | Johny B   | James     | 4545222211
6    | Johns #2    | Johny     | James     | 4545222211
7    | Johns #3    | Johny     | James     | 4545222211
8    | Ricks #2    | Rick A    | Lawson    | 4545334222

I need to know how to create an SQL statement that finds all the duplicate records, adds them up, then displays the total amount of duplicates found with the first company name found.

*HERE's THE CATCH - As you see from the table above, sometimes they put their initials with their name (Example - ID5 and ID6 are the same exact person, but he put his middle initial in ID5, you can see the same with ID4 and ID7)

I need the SQL statement to base the comparison off first matching the lastname, then doing a "IS LIKE" on the first name and company name "kind-of-thing" to make sure that they are being counted as they should be.

For Example - I should get a result similar to...

Table Name: TableResults
_____________________________
ID   | CompanyName | FirstName | LastName  | Phone       | Count
-------------------------------
4    | Ricks #1    | Rick      | Lawson    | 4545334222  | 2
5    | Johns #1    | Johny B   | James     | 4545222211  | 3

Is this even possible?

Upvotes: 0

Views: 141

Answers (2)

curtisdf
curtisdf

Reputation: 4210

I believe the name analysis will be some really complicated SQL, if it's possible at all. First off, you may want to make sure your name columns use a Case-insensitive collation such as utf8_general_ci. I wouldn't trust people to consistently capitalize their names. They may also have typo's if they're being careless.

LIKE isn't smart enough to match on strings that have small differences unless you know exactly where to put the %. It's more just a wildcard matching function.

Maybe check out using a SOUNDEX algorithm? But this will only work if all of your data will be in English.

Is the Phone number a reliable way to identify duplicates? If so, you could skip the name analysis and just do this:

SELECT ID, CompanyName, FirstName, LastName, Phone, COUNT(1) AS Count
FROM Customers 
GROUP BY Phone
ORDER BY ID ASC

Upvotes: 0

Chandu
Chandu

Reputation: 82913

I think you might need to use fuzzy matching.

Try this:

WITH InitialData AS
(
  SELECT *, SOUNDEX(LastName) + SOUNDEX(CompanyName) Similarity 
    FROM Customers
), FinalData AS
(
  SELECT *, 
          ROW_NUMBER() OVER(PARTITION BY Similarity ORDER BY ID) Position,
          COUNT(1)  OVER(PARTITION BY Similarity) Count
  FROM InitialData
)
SELECT ID, CompanyName, FirstName, LastName, Phone, Count
  FROM FinalData
WHERE Position = 1
  AND Count > 1

SQLFiddle: http://sqlfiddle.com/#!3/7c1fb/2

Upvotes: 2

Related Questions