chpatrick
chpatrick

Reputation: 481

SQL find identical group

Given a table like:

id    key   val
----  ----  -----
bob   hair  red
bob   eyes  green

And another table like:

id    key   val
----  ----  -----
fred  hair  red
fred  eyes  green
fred  shoe  42
joe   hair  red 
joe   eyes  green
greg  eyes  blue
greg  hair  brown

I'd like to find people in table b who match people in table a exactly, in this case Bob and Joe. Fred doesn't count because he also has a shoe size. This is in Sybase so there's no full outer join. I've come up with a select of a select with a union that returns people who definitely aren't the same, but I'm not sure how to efficiently select people who are.

Alternatively, if it's simpler, how can I check which groups in a occur in b more than once?

Upvotes: 5

Views: 449

Answers (3)

niktrs
niktrs

Reputation: 10066

Try this

select a.id,b.id
from a 
join b on a.[key] = b.[key] and a.val = b.val -- match all rows
join (select id,count(*) total from a group by id) a2 on a.id = a2.id -- get the total keys for table a per id
join (select id,count(*) total from b group by id) b2 on b.id = b2.id -- get the total keys for table b per id
group by a.id,b.id,a2.total,b2.total
having count(*) = a2.total AND count(*) = b2.total -- the matching row's total should be equal with each tables keys per id

After @t-clausen.dk comments I made a revision of the original sql code. In this case i count each distinct pair/value that matches on both tables, with each tables distinct pair/value.

select td.aid,td.bid
from (
select a.id as aid,b.id as bid, count(distinct a.[key]+' '+a.val) total
from a 
join b on a.[kry] = b.[key] and a.val = b.val
group by a.id,b.id
) td -- match all distinct attribute rows
join (select id,count(distinct [key]+' '+val) total from a group by id) a2 on td.aid = a2.id -- get the total distinct keys for table a per id
join (select id,count(distinct [key]+' '+val) total from b group by id) b2 on td.bid = b2.id -- get the total keys for table b per id
where td.total = a2.total AND td.total = b2.total -- the matching distinct attribute total should be equal with each tables distinct key-val pair

Tested on

Table a

bob     hair    red
bob     eyes    green
nick    hair    red
nick    eyes    green
nick    shoe    45

Table b

fred    hair    red
fred    eyes    green
joe     hair    red
joe     eyes    green
fred    shoe    42

Upvotes: 2

t-clausen.dk
t-clausen.dk

Reputation: 44336

This syntax will find the exact matches on different names in @t1 and @t2. I appologize because is written in MSSQL. I hope it can be converted to Sybase. After playing with it all day I want to share this beauty. I know these long scripts are not popular pointwise. I hope someone will appriciate it anyway.

This select make an exact match on @t2 within @t1.

I have populated the tables in this link https://data.stackexchange.com/stackoverflow/q/108035/

DECLARE @t1 TABLE(id varchar(10), [key] varchar(10), val varchar(10))
DECLARE @t2 TABLE(id varchar(10), [key] varchar(10), val varchar(10))

;WITH t1 AS ( 
SELECT t1.id, t1.[key], t1.val, count(*) count1, sum(count(*)) OVER(PARTITION BY t1.id) sum1 FROM @t1 t1 
GROUP BY t1.id, t1.[key], t1.val
), t2 as (
SELECT t2.id, t2.[key], t2.val, count(*) count1, sum(count(*)) OVER(PARTITION BY t2.id) sum1 FROM @t2 t2 
GROUP BY t2.id, t2.[key], t2.val
), t3 AS ( 
SELECT t1.*, sum(t1.count1) OVER(PARTITION BY t1.id) sum2
FROM t1 
JOIN t2 on t1.val = t2.val AND t1.[key]=t2.[key]
AND t1.count1 = t2.count1 AND t1.sum1 = t2.sum1
)
SELECT t3.id, t3.[key], t3.val FROM t3
JOIN @t2 t ON t3.[key] = t.[key] AND t3.val = t.val
WHERE t3.sum2 = t3.sum1

Don't try the script, it doesn't contain data, use the link where the tables are populated.

Upvotes: 1

Andomar
Andomar

Reputation: 238216

You can emulate a full outer join by grabbing all ids in a subquery, and then left joining them in two directions:

select  ids.id
from    (
        select  distinct id
        from    @a
        union
        select  id
        from    @b
        ) as ids
left join
        @a a1
on      a1.id = ids.id
left join
        @b b1
on      a1.id = b1.id
        and a1.[key] = b1.[key]
        and a1.val = b1.val
left join
        @b b2
on      b2.id = ids.id
left join
        @a a2
on      b2.id = a2.id
        and b2.[key] = a2.[key]
        and b2.val = a2.val
group by
        ids.id
having  sum(case when b1.id is null or a2.id is null then 1 else 0 end) = 0

Example at SE DATA.

Upvotes: 2

Related Questions