Select id on grouped unique set of data

Question

My Problem is showing only id's of grouped unique Datasets. A simple example would serve best:

| id | color |
--------------
| 1  | red   |
--------------
| 1  | green |
--------------
| 1  | blue  |
--------------
| 2  | red   |
--------------
| 2  | green |
--------------
| 2  | blue  |
--------------
| 3  | red   |
--------------
| 3  | blue  |
--------------
| 3  | yellow|
--------------
| 3  | purple|
--------------

Id 1 and id 2 have the same subset of data (red, green, blue), so the result table should only contain either 1 OR 2:

| id |
------
| 1  |
------
| 3  | 
------

I guess this relatively basic Problem was asked multiple times but I could not nail the specific keywords which would produce results.

Gordon Linoff · Accepted Answer

Although SQLite has group_concat(), it won't help here because the order of the concatenated elements is arbitrary. That is the easiest way to do this.

Instead, we have to think of this relationally. The idea is to do the following:

Count the number of colors that two ids have in common
Count the number of colors on each id
Select id pairs where these three values are equal
Identify each pair by the minimum id in the pair

Then distinct values of the minimum are the list you want.

The following query takes this approach:

select distinct MIN(id2)
from (select t1.id as id1, t2.id as id2, count(*) as cnt
      from t t1 join
           t t2
           on t1.color = t2.color
      group by t1.id, t2.id
     ) t1t2 join
     (select t.id, COUNT(*) as cnt
      from t
      group by t.id
     ) t1sum
     on t1t2.id1 = t1sum.id and t1sum.cnt = t1t2.cnt join
     (select t.id, COUNT(*) as cnt
      from t
      group by t.id
     ) t2sum
     on t1t2.id2 = t2sum.id and t2sum.cnt = t1t2.cnt
group by t1t2.id1, t1t2.cnt, t1sum.cnt, t2sum.cnt

I actually tested this in SQL Server by placing this with clause in front:

with t as (
      select 1 as id, 'r' as color union all
      select 1, 'g' union all
      select 1, 'b' union all
      select 2 as id, 'r' as color union all
      select 2, 'g' union all
      select 2, 'b' union all
      select 3, 'r' union all
      select 4, 'y' union all
      select 4, 'p' union all
      select 5 as id, 'r' as color union all
      select 5, 'g' union all
      select 5, 'b' union all
      select 5, 'p'
     )

Select id on grouped unique set of data

Answers (2)

Related Questions