Reputation: 520
I'm writing a query on data with a 'provider_id' column. The data is mostly unique except for some entries with all columns but 'provider_id' being unique. Most of the values for provider_id are 1 but when there are duplicates as described above I want to remove the row with provider_id = 1. There are only ever 2 duplicates at a time.
The query below lets me find all the duplicates but I'm not sure how to then select the correct row based off this.
select position_key, pricing_provider
from customer_data
group by position_key, pricing_provider
having count(*) > 0
How can I select out where pricing_provider != 1?
Ex:
position_key | pricing_provider |account | user
152894 1 2 5
152704 1 2 1084
152678 6 2 43
152513 6 2 38
152305 1 2 1121
152305 6 2 1121
152300 6 2 1121
152300 1 2 1121
I want:
position_key | pricing_provider |account | user
152894 1 2 5
152704 1 2 1084
152678 6 2 43
152513 6 2 38
152305 6 2 1121
152300 6 2 1121
Upvotes: 1
Views: 60
Reputation: 1269443
If you care about duplicates, I would expect > 1
, not > 0
. Then, window functions are probably the best solution:
select position_key, pricing_provicer
from (select position_key, pricing_provider,
count(*) filter (where pricing_provider <> 1) over (partition by position_key) as cnt_not_1
from customer_data
group by position_key, pricing_provider
having count(*) > 1
) pp
where pricing_provider <> 1 or cnt_not_1 = 0;
Upvotes: 1
Reputation: 70513
Just add a where to an outer select
SELECT *
FROM (
select position_key, pricing_provider
from customer_data
group by position_key, pricing_provider
having count(*) > 0
) X
WHERE pricing_provider <> 1
Upvotes: 1
Reputation: 1985
I do not know PostgresSQL. But in SQL server, we can use something like below:
select position_key, pricing_provider
from customer_data o
join (
select position_key, pricing_provider
from customer_data
group by position_key, pricing_provider
having count(1) > 1) as dupes on o.pricing_provider = duples.pricing_provider
Upvotes: 0