Michael O'Keefe
Michael O'Keefe

Reputation: 53

SQL: Eliminating duplicates with specific conditions

I have a SQL database that SOMETIMES has duplicate values, but only in one column (phone number). If there is a duplicate, the other attributes in the same row are filled in with NULL. In other cases, the phone number is not duplicated, but still has NULL values in the rows. Ex:

first_name last_name phone_number
john smith 123-456-7890
NULL NULL 123-456-7890
NULL NULL 456-789-1011
carry smith 121-314-1516

I'm trying to write a query that eliminates cases where the phone number is duplicated and the other values in the row are NULL, to get:

first_name last_name phone_number
john smith 123-456-7890
NULL NULL 456-789-1011
carry smith 121-314-1516

Any ideas?

Upvotes: 0

Views: 280

Answers (4)

Gary Martirosyan
Gary Martirosyan

Reputation: 44

I would use cte for it. Here's the code that does it.

    with cte as (
    select phone_number from phone_numbers 
    group by phone_number
    having count(*) > 1
)

delete phone_numbers
where phone_number in (select phone_number from cte)
and first_name is null and last_name is null

Upvotes: 0

Hogan
Hogan

Reputation: 70523

Here is the fastest way to do it, left join to the items you want to remove and then add a where clause for null results for the join. Every row that meets the join requirements WILL NOT be in the results.

I call this an Exclusionary Left Join.

SELECT *
FROM tableyoudidnotname main
LEFT JOIN tableyoudidnotname sub on 
               main.phone_number = sub.phone_number 
           and sub.first_name is null 
           and sub.last_name is null
WHERE sub.phone_number is null

Upvotes: 0

Josh
Josh

Reputation: 1616

One way, might be to use a subquery to identify the phone_numbers only once.. and then outer join to the records you want without nulls. Something like this:

SELECT *
FROM 
 (SELECT phone_number AS root_phone_number
  FROM table
  GROUP BY phone_number
 ) AS phonenumbers
LEFT OUTER JOIN
 (SELECT *
  FROM table
  WHERE first_name IS NOT NULL
 ) as notnulls
ON phonenumbers.phone_number = notnulls.phone_number

Upvotes: 0

Kurt
Kurt

Reputation: 1748

In cases like this you probably want a NOT EXISTS clause. This does a lookup for each row in the table, to see if there are any other records with the same phone number and populated name fields.

select
    first_name,
    last_name,
    phone_number
from
    phone_numbers pn
where
    not exists (
        select 1
        from phone_numbers pn2
        where pn2.phone_number = pn.phone_number
        and pn.first_name is not null
        and pn.last_name is not null
    )

Although I'm not sure it's perfect. If there is a case where two records have the same phone number and both have NULL names then neither would be returned.

Upvotes: 2

Related Questions