Mechlar
Mechlar

Reputation: 4974

Filter Duplicate records by hierarchy?

I have a table called 1 Main Contacts ("Main"), and several other tables. They all have column named Contact_ID (or referral_ID), which hooks them up. A problem with the way this is all setup is that records in "Main" can be linked to more than one record in the referral table, so I get duplicate records after running a query to pull contacts by the "Contact_Source" column which is in the referral table.

I have created a view from which I can Select from when running queries from a website so I am pulling from data pertinent to the correct situation. I run this query by their "Contact Source" as well. I posted 2 questions earlier (here and here) in order to keep from getting duplicate records. I have that working.

Here is the final code I got to keep me from getting duplicate records:

ALTER VIEW dbo.v_angelview AS

WITH    q AS
        (
        SELECT  dbo.[1_MAIN - Contacts].Contact_ID, dbo.[1_MAIN - Contacts].Date_entered_into_Database, dbo.[1_MAIN - Contacts].Date_of_Initial_Contact, 
                      dbo.[1_MAIN - Contacts].[Company_ Name], dbo.[1_MAIN - Contacts].Key_Contact_Title, dbo.[1_MAIN - Contacts].Key_Contact_First_Name, 
                      dbo.[1_MAIN - Contacts].Key_Contact_Middle, dbo.[1_MAIN - Contacts].Key_Contact_Last_Name, dbo.[1_MAIN - Contacts].Key_Credential, 
                      dbo.[1_MAIN - Contacts].Key_Contact_Occupation, dbo.[1_MAIN - Contacts].Key_Degree_1, dbo.[1_MAIN - Contacts].Key_Degree_2, 
                      dbo.[1_MAIN - Contacts].Key_Degree_3, dbo.[1_MAIN - Contacts].Date_of_Highest_Degree, dbo.[1_MAIN - Contacts].Work_Setting, 
                      dbo.[1_MAIN - Contacts].Website_Address, dbo.[1_MAIN - Contacts].Email_1_Key_Contact, dbo.[1_MAIN - Contacts].Email_2, 
                      dbo.[1_MAIN - Contacts].Email_3, dbo.[1_MAIN - Contacts].Day_Time_Phone_Number, dbo.[1_MAIN - Contacts].Extension, 
                      dbo.[1_MAIN - Contacts].Mobile_Phone_Number, dbo.[1_MAIN - Contacts].Bus_Fax_Number, dbo.[1_MAIN - Contacts].Home_Phone_Number, 
                      dbo.[1_MAIN - Contacts].Home_Fax_Number, dbo.[1_MAIN - Contacts].Mailing_Street_1, dbo.[1_MAIN - Contacts].Mailing_Street_2, 
                      dbo.[1_MAIN - Contacts].Mailing_City, dbo.[1_MAIN - Contacts].Mailing_State, dbo.[1_MAIN - Contacts].[Mailing_Zip/Postal], 
                      dbo.[1_MAIN - Contacts].Mailing_Country, dbo.[1_MAIN - Contacts].[Bad_Address?], dbo.[1_MAIN - Contacts].[PROV/REG?], 
                      dbo.[1_MAIN - Contacts].status_flag, dbo.[1_MAIN - Contacts].status_flag AS status_flag2, dbo.Providers.Referral_Source, dbo.Referral.Contact_Source, 
                      dbo.Resource_Center.cert_start_date, dbo.Resource_Center.cert_exp_date, dbo.prov_training_records.Contact_ID AS Expr2, 
                      dbo.prov_training_records.date_reg_email_sent, dbo.Resource_Center.access, dbo.Providers.Contact_ID AS Expr1,
                ROW_NUMBER() OVER (PARTITION BY dbo.[1_MAIN - Contacts].Contact_ID ORDER BY dbo.[1_MAIN - Contacts].Contact_ID) AS rn
        FROM    dbo.[1_MAIN - Contacts]
        INNER JOIN
                dbo.Referral
        ON      dbo.[1_MAIN - Contacts].Contact_ID = dbo.Referral.Referral_ID
        INNER JOIN
                dbo.prov_training_records
        ON      dbo.[1_MAIN - Contacts].Contact_ID = dbo.prov_training_records.Contact_ID
        LEFT OUTER JOIN
                dbo.Resource_Center
        ON      dbo.[1_MAIN - Contacts].Contact_ID = dbo.Resource_Center.Contact_ID
        FULL OUTER JOIN
                dbo.Providers
        ON      dbo.[1_MAIN - Contacts].Contact_ID = dbo.Providers.Contact_ID
WHERE (dbo.[1_MAIN - Contacts].Mailing_State = N'AL') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'FL') OR

                      (dbo.[1_MAIN - Contacts].Mailing_State = N'GA') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'KY') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'MS') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'NC') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'SC') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'TN') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'PR') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'CO') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'MT') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'ND') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'SD') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'UT') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'WY') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'AR') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'LA') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'NM') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'OK') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'TX') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'AZ') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'CA') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'HI') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'ID') OR
                      (dbo.[1_MAIN - Contacts].Mailing_State = N'NV')
        )
SELECT  *
FROM    q
WHERE   rn = 1

Now the problem I am facing is that I need to set a precedence of which duplicate record I keep and which ones get removed.

This is the hierarchy of precedence starting with the most precedent:

  1. Contact_Source = 'PROVIDER'
  2. Contact_Source LIKE 'RG_%'
  3. Contact_Source LIKE 'IN_%'
  4. Contact_Source LIKE 'LD_%'

A record could be linked to one or all of these. So, for example, if a record has PROVIDER and RG_Train, I want to keep the record with PROVIDER. And so on down the list. Again, all records have a Contact_ID, that is how I can tell there are duplicates.

Is there a way to modify my existing SQL to do this or does this require a new approach? If so, how do I get it to remove duplicate records according to my precedence list?

I am using SQL Server 2005.

Thanks in advance!

Upvotes: 0

Views: 250

Answers (2)

Dr. Wily's Apprentice
Dr. Wily's Apprentice

Reputation: 10280

This doesn't take advantage of your use of the ROW_NUMBER function, but I believe this would work:

with q as (
    -- <query>
),
cp as (
    select  1 as Precedence, 'PROVIDER' ContactSourcePattern
    union all   select 2, 'RG_%'
    union all   select 3, 'IN_%'
    union all   select 4, 'LD_%'
)
select q.*
from q
inner join cp on q.Contact_Source like cp.ContactSourcePattern
where
    -- filter out duplicate records with the same `Contact_ID` that have a lower precedence than other records
    not exists (
        select 1
        from q as q2 inner join cp as cp2 on q2.Contact_Source like cp2.ContactSourcePattern
        where
            q2.Contact_ID = q.Contact_ID -- q2 is a duplicate of q if `Contact_ID` matches
            and cp2.Precedence < cp.Precedence -- q2/cp2 is higher precedence than q/cp if `Precedence` is a smaller number
    )

Upvotes: 0

Ben Tennen
Ben Tennen

Reputation: 445

Try...

rn = ROW_NUMBER() 
OVER
(
  PARTITION BY dbo.[1_MAIN - Contacts].Contact_ID
  ORDER BY 
  CASE 
    WHEN Contact_Source = 'PROVIDER'
    THEN 1

    WHEN Contact_Source LIKE 'RG_%'
    THEN 2

    WHEN Contact_Source LIKE 'IN_%'
    THEN 3

    WHEN Contact_Source LIKE 'LD_%'
    THEN 4

    ELSE 5
  END
)

Upvotes: 1

Related Questions