Reputation: 2204
I have a table with duplicated entries (they have the same uid).
I want to remove the duplicates from the query result by using a GROUP BY clause (one of the entries is valid, but it is random. I can only find out what the value is by joining it with db2 via rid. I am using an aggregate function (MAX or MIN) to keep a row, but the issue I'm having is that I can't select the row I need. This results in getting fewer rows from my query.
SELECT
MAX(db1.id) AS id,
FROM db1
LEFT JOIN db2 ON
db1.rid = db2.rid
GROUP BY r.uid
Is there anything I can do to get around this? I'm running queries on Amazon Redshift.
Example:
Leads Table
id | uid | rid
1 | aaa | 111
2 | aaa | 123
3 | bbb | 312
Sales table
id | rid
1 | 111
2 | 123
3 | 312
The leads table has a duplicate entry (ID 1 and 2) because uid is identical. I have to remove one of the duplicates, but that has to be ID 2 because ID 1 can be found in the sales table (linked by rid).
Upvotes: 1
Views: 230
Reputation: 1270993
I think you want something like this:
select db1.id
from (select db1.*, count(*) over (partition by db1.uid) as cnt
from db1
) db1 left join
db2
on db2.rid = db1.rid
where cnt = 1 or cb2.rid is not null;
Upvotes: 1
Reputation: 37483
You can use row_number() function
with cte as
(
SELECT *, row_number() over(partition by r.uid order by db1.id desc) as rn
FROM db1
LEFT JOIN db2 ON
db1.rid = db2.rid
)
delete from cte where rn<>1
Upvotes: 2
Reputation: 32001
you can use distinct
SELECT
distinct db1.id AS id
FROM db1
LEFT JOIN db2 ON
db1.rid = db2.rid
Upvotes: 1