toop
toop

Reputation: 11294

Self join without double result set?

There are some duplicate (appearing twice or more) combinations of (id, name, dept_name). Note: the dept_name is nullable so there are often some nulls in that column.

I am trying to get a result of the duplicate (id, name, dept_name) combos where one record has got a null mgr_id or sal but another record of that same duplicated combo has a non-null mgr_id or sal.

This is what I have so far:

 select t1.id, t1.name, t1.dept_name, t1.mgr_id, t2.mgr_id, t1.sal, t2.sal from employee t1
    inner join employee t2 on t2.id = t1.id and t2.name = t1.name
    where t1.id in
    (select id from 
    (select id, name, dept_name, count(*) from employee
    group by id, name, dept_name
    having count(*) > 1))
    and (nvl(t1.mgr_id,0) <> nvl(t2.mgr_id,0) 
    or nvl(t1.sal,0) <> nvl(t2.sal,0)) 
    and t1.mgr_id is not null ;

If I don't leave the last line "and t1.mgr_id is not null" in the query I effectively receive the same resultset twice (which I don't want). However I need some way of incorporating the (t1.sal is not null) clause to not get the same resultset twice.

Also, note there is no PK or sequence number column where I can just go: t1.seq_no <> t2.seq_no.

Upvotes: 1

Views: 2080

Answers (5)

tsells
tsells

Reputation: 2771

I created the following post to show how I normally find duplicates in the same table.

http://tsells.wordpress.com/2010/01/08/sql-query-trick-to-find-duplicate-records/

I find the criteria I want to find matches on (MyName in example below) and then use another field to determine which values are not equal (normally an ID Field).

select 
t1.ID,
t1.MyName,
t2.MyName,
t2.ID
from
Table t1 
inner join Table t2 on t1.MyName = t2.MyName and t1.ID <> t2.ID

Note that this can return what appears to be duplicates - but tweaking the query a bit can resolve that. Also I normally use this for spot checking data versus production queries.

Upvotes: 1

Kushal
Kushal

Reputation: 3168

Try this if it works...

select id,name,dept_name 
from employee 
where id in (select a.id 
             from employee a 
             where (a.sal is NULL or a.mgr_id is NULL) 
and a.id in (select b.id 
             from employee b 
             where b.sal is not NULL or b.mgr_id is not NULL));

Upvotes: 1

Florin Ghita
Florin Ghita

Reputation: 17643

This is because you have 4 situations.

  • sal null, dept null

  • sal null, dept nonnull

  • sal nonnull, dept null

  • sal nonnull, dept nonnull

And you have only two sets of columns to compare.

Somehow you'll get duplicates.

As a hibrid solution you can try:

select distinct t1.id, t1.name, t1.dept_name, t1.mgr_id, t2.mgr_id, t1.sal, t2.sal from employee t1
    inner join employee t2 on t2.id = t1.id and t2.name = t1.name
    where t1.id in
    (select id from 
    (select id, name, dept_name, count(*) from employee
    group by id, name, dept_name
    having count(*) > 1))
    and (nvl(t1.mgr_id,0) <> nvl(t2.mgr_id,0) 
    or nvl(t1.sal,0) <> nvl(t2.sal,0)) ;

(note the distinct)

or, you can use:

select distinct t1.id, t1.name, t1.dept_name, t1.mgr_id 
from employee t1
order by 1, 2, 3, 4

and you'll see the difference between rows.

Upvotes: 1

Tony Andrews
Tony Andrews

Reputation: 132600

You can use ROWID to distinguish the rows:

and t1.ROWID <> t2.ROWID

Upvotes: 1

StevieG
StevieG

Reputation: 8709

Can't you just add the filter to the OR?:

 select t1.id, t1.name, t1.dept_name, t1.mgr_id, t2.mgr_id, t1.sal, t2.sal from employee t1
    inner join employee t2 on t2.id = t1.id and t2.name = t1.name
    where t1.id in
    (select id from 
    (select id, name, dept_name, count(*) from employee
    group by id, name, dept_name
    having count(*) > 1))
    and ((nvl(t1.mgr_id,0) <> nvl(t2.mgr_id,0) AND t1.mgr_id is not null) 
    or (nvl(t1.sal,0) <> nvl(t2.sal,0) AND t1.sal is not null));

Upvotes: 2

Related Questions