Reputation: 520

Sql query to flter by column only if duplicate exists

I'm writing a query on data with a 'provider_id' column. The data is mostly unique except for some entries with all columns but 'provider_id' being unique. Most of the values for provider_id are 1 but when there are duplicates as described above I want to remove the row with provider_id = 1. There are only ever 2 duplicates at a time.

The query below lets me find all the duplicates but I'm not sure how to then select the correct row based off this.

select position_key, pricing_provider
from customer_data
group by position_key, pricing_provider
having count(*) > 0

How can I select out where pricing_provider != 1?

Ex:

 position_key | pricing_provider |account | user 
152894  1   2   5
152704  1   2   1084
152678  6   2   43
152513  6   2   38
152305  1   2   1121
152305  6   2   1121
152300  6   2   1121
152300  1   2   1121

I want:

 position_key | pricing_provider |account | user 
152894  1   2   5
152704  1   2   1084
152678  6   2   43
152513  6   2   38
152305  6   2   1121
152300  6   2   1121

Upvotes: 1

Answers (3)

Gordon Linoff

Reputation: 1269443

If you care about duplicates, I would expect > 1, not > 0. Then, window functions are probably the best solution:

select position_key, pricing_provicer
from (select position_key, pricing_provider,
             count(*) filter (where pricing_provider <> 1) over (partition by position_key) as cnt_not_1
      from customer_data
      group by position_key, pricing_provider
      having count(*) > 1
     ) pp
where pricing_provider <> 1 or cnt_not_1 = 0;

Upvotes: 1

Hogan

Reputation: 70513

Just add a where to an outer select

SELECT * 
FROM (
  select position_key, pricing_provider
  from customer_data
  group by position_key, pricing_provider
  having count(*) > 0
) X
WHERE pricing_provider <> 1

Upvotes: 1

sam

Reputation: 1985

I do not know PostgresSQL. But in SQL server, we can use something like below:

select position_key, pricing_provider
from customer_data o
join (
   select position_key, pricing_provider
   from customer_data
    group by position_key, pricing_provider
    having count(1) > 1) as dupes on o.pricing_provider = duples.pricing_provider

Upvotes: 0

Sql query to flter by column only if duplicate exists

Answers (3)

Related Questions