Sebastian
Sebastian

Reputation: 2204

Run a JOIN statement that excludes duplicate rows

I have a table with duplicated entries (they have the same uid).

I want to remove the duplicates from the query result by using a GROUP BY clause (one of the entries is valid, but it is random. I can only find out what the value is by joining it with db2 via rid. I am using an aggregate function (MAX or MIN) to keep a row, but the issue I'm having is that I can't select the row I need. This results in getting fewer rows from my query.

SELECT
MAX(db1.id) AS id,
FROM db1
LEFT JOIN db2 ON
db1.rid = db2.rid
GROUP BY r.uid

Is there anything I can do to get around this? I'm running queries on Amazon Redshift.

Example:

Leads Table

id | uid | rid
1 | aaa | 111
2 | aaa | 123
3 | bbb | 312

Sales table

id | rid
1 | 111
2 | 123
3 | 312

The leads table has a duplicate entry (ID 1 and 2) because uid is identical. I have to remove one of the duplicates, but that has to be ID 2 because ID 1 can be found in the sales table (linked by rid).

Upvotes: 1

Views: 230

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270993

I think you want something like this:

select db1.id
from (select db1.*, count(*) over (partition by db1.uid) as cnt
      from db1
     ) db1 left join
     db2
     on db2.rid = db1.rid
where cnt = 1 or cb2.rid is not null;

Upvotes: 1

Fahmi
Fahmi

Reputation: 37483

You can use row_number() function

with cte as
    (
    SELECT *, row_number() over(partition by r.uid order by db1.id desc) as rn
    FROM db1
    LEFT JOIN db2 ON
    db1.rid = db2.rid
    )
delete from cte where rn<>1

Upvotes: 2

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 32001

you can use distinct

SELECT
distinct db1.id AS id
FROM db1
LEFT JOIN db2 ON
db1.rid = db2.rid

Upvotes: 1

Related Questions