Reputation: 1
i have [RecToProcessId, intContactId, vchrCompany, Oldcompany Title, vchrLastName, vchrFirstName, Designation, vchrAddress1, vchrAddress2 vchrPhoneNo, vchrBusinessNo, vchrMobileNo, vchrFaxNo, Email_Old, Email_New, ,Country, vchrBusinessCity, vchrZipCode, STD_Code, ISD_Code, dtUpdateDate, CITY_FLAG, RecordStatus, CompanyUrl,ContactUrl] Columns in one table with more than 50000 records in it i already tried finding duplicates records using Email_New column but there were some junk email too
{SELECT *
into INDIVIDUAL_DEDUPE_DATA
FROM (SELECT *, COUNT(*) OVER (PARTITION BY vchrEmail) AS dup_key
FROM FINAL_RECEIVED_COMBINED_INDIVIDUAL_UNCLEANED_NON_COMPANY
) T
WHERE vchrEmail is not null
and len(vchrEmail)<>0
and dup_key>1
and vchrEmail<>'0'
and vchrEmail<>'-'
and vchrEmail not like '%abc%'
and vchrEmail not like '%xyz%'
and vchrEmail not like '%email%'
and vchrEmail not like '%info%'
and vchrEmail not like '%no@email%'
and vchrEmail not like '%no@gmail%'
and vchrEmail not like '%test%'
and vchrEmail not like '%test@test%'
and vchrEmail not like '%xxx%'
and vchrEmail not like '%xxx@xxx%'
and vchrEmail not like '%xy@%'
ORDER BY vchrEmail}
how can find duplicates on vchrMobileNo and vchrFirstName+vchrLastName as name column or any other ways ?
Upvotes: 0
Views: 65
Reputation: 108995
The answer is to group by
...
select count(1), col1, col2
from theTable
group by col1,col2
having count(1) > 1
Will return rows where there is more than one row with the same value for the two specified columns. Once you have this information you can use it as a sub-query to do something with those rows. Eg. to see all the columns in them:
select outer.*
from theTable outer
inner join (select col1, col2
from theTable
group by col1,col2
having count(1) > 1) as inner
on outer.col1 = inner.col1 and outer.col2 = inner.col2
Upvotes: 2