kadir
kadir

Reputation: 1417

Select id on grouped unique set of data

My Problem is showing only id's of grouped unique Datasets. A simple example would serve best:

| id | color |
--------------
| 1  | red   |
--------------
| 1  | green |
--------------
| 1  | blue  |
--------------
| 2  | red   |
--------------
| 2  | green |
--------------
| 2  | blue  |
--------------
| 3  | red   |
--------------
| 3  | blue  |
--------------
| 3  | yellow|
--------------
| 3  | purple|
--------------

Id 1 and id 2 have the same subset of data (red, green, blue), so the result table should only contain either 1 OR 2:

| id |
------
| 1  |
------
| 3  | 
------

I guess this relatively basic Problem was asked multiple times but I could not nail the specific keywords which would produce results.

Upvotes: 0

Views: 80

Answers (2)

CL.
CL.

Reputation: 180070

SQL is set-oriented, so let's try this:

Unique IDs are IDs for which no other ID exists that has an equal set of colors.

To determine whether two IDs have equal sets of colors, we subtract them from each other (this is what EXCEPT does) and test whether the result is empty in both directions:

SELECT id
FROM (SELECT DISTINCT id FROM t) AS t1
WHERE NOT EXISTS (SELECT id FROM (SELECT DISTINCT id FROM t) AS t2
                  WHERE t2.id < t1.id
                    AND NOT EXISTS (SELECT color FROM t WHERE id = t1.id
                                    EXCEPT
                                    SELECT color FROM t WHERE id = t2.id)
                    AND NOT EXISTS (SELECT color FROM t WHERE id = t2.id
                                    EXCEPT
                                    SELECT color FROM t WHERE id = t1.id));

SQL Fiddle

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269853

Although SQLite has group_concat(), it won't help here because the order of the concatenated elements is arbitrary. That is the easiest way to do this.

Instead, we have to think of this relationally. The idea is to do the following:

  1. Count the number of colors that two ids have in common
  2. Count the number of colors on each id
  3. Select id pairs where these three values are equal
  4. Identify each pair by the minimum id in the pair

Then distinct values of the minimum are the list you want.

The following query takes this approach:

select distinct MIN(id2)
from (select t1.id as id1, t2.id as id2, count(*) as cnt
      from t t1 join
           t t2
           on t1.color = t2.color
      group by t1.id, t2.id
     ) t1t2 join
     (select t.id, COUNT(*) as cnt
      from t
      group by t.id
     ) t1sum
     on t1t2.id1 = t1sum.id and t1sum.cnt = t1t2.cnt join
     (select t.id, COUNT(*) as cnt
      from t
      group by t.id
     ) t2sum
     on t1t2.id2 = t2sum.id and t2sum.cnt = t1t2.cnt
group by t1t2.id1, t1t2.cnt, t1sum.cnt, t2sum.cnt

I actually tested this in SQL Server by placing this with clause in front:

with t as (
      select 1 as id, 'r' as color union all
      select 1, 'g' union all
      select 1, 'b' union all
      select 2 as id, 'r' as color union all
      select 2, 'g' union all
      select 2, 'b' union all
      select 3, 'r' union all
      select 4, 'y' union all
      select 4, 'p' union all
      select 5 as id, 'r' as color union all
      select 5, 'g' union all
      select 5, 'b' union all
      select 5, 'p'
     )

Upvotes: 1

Related Questions