Reputation: 4648
I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr
values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid
and dobyr
limited to the cases where there is only one value of dobyr
for the userid
in userinteractions
.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions
table, so I need a SELECT
solution, not a DELETE
solution.
Upvotes: 1
Views: 387
Reputation: 25988
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
userid
value but also the same dobyr
value, one of them is kept (doesn't matter which one), rest gets discarded.userid
are discarded if it occurs with different dobyr
values.create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in
, not exists
or exists
conditions. Also, select which combination to keep by adding columns at the end of order by
.
Updated demo with tests and more rows.
If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid
's from records with (userid,dobyr)
pairs matching your criteria are enough, as they already uniquely identify those records:
create table userinteractions_whitelist as
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1
Upvotes: 1
Reputation: 86706
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1
Upvotes: 0