Reputation: 1417
My Problem is showing only id's of grouped unique Datasets. A simple example would serve best:
| id | color |
--------------
| 1 | red |
--------------
| 1 | green |
--------------
| 1 | blue |
--------------
| 2 | red |
--------------
| 2 | green |
--------------
| 2 | blue |
--------------
| 3 | red |
--------------
| 3 | blue |
--------------
| 3 | yellow|
--------------
| 3 | purple|
--------------
Id 1 and id 2 have the same subset of data (red, green, blue), so the result table should only contain either 1 OR 2:
| id |
------
| 1 |
------
| 3 |
------
I guess this relatively basic Problem was asked multiple times but I could not nail the specific keywords which would produce results.
Upvotes: 0
Views: 80
Reputation: 180070
SQL is set-oriented, so let's try this:
Unique IDs are IDs for which no other ID exists that has an equal set of colors.
To determine whether two IDs have equal sets of colors, we subtract them from each other (this is what EXCEPT
does) and test whether the result is empty in both directions:
SELECT id
FROM (SELECT DISTINCT id FROM t) AS t1
WHERE NOT EXISTS (SELECT id FROM (SELECT DISTINCT id FROM t) AS t2
WHERE t2.id < t1.id
AND NOT EXISTS (SELECT color FROM t WHERE id = t1.id
EXCEPT
SELECT color FROM t WHERE id = t2.id)
AND NOT EXISTS (SELECT color FROM t WHERE id = t2.id
EXCEPT
SELECT color FROM t WHERE id = t1.id));
Upvotes: 1
Reputation: 1269853
Although SQLite has group_concat()
, it won't help here because the order of the concatenated elements is arbitrary. That is the easiest way to do this.
Instead, we have to think of this relationally. The idea is to do the following:
Then distinct values of the minimum are the list you want.
The following query takes this approach:
select distinct MIN(id2)
from (select t1.id as id1, t2.id as id2, count(*) as cnt
from t t1 join
t t2
on t1.color = t2.color
group by t1.id, t2.id
) t1t2 join
(select t.id, COUNT(*) as cnt
from t
group by t.id
) t1sum
on t1t2.id1 = t1sum.id and t1sum.cnt = t1t2.cnt join
(select t.id, COUNT(*) as cnt
from t
group by t.id
) t2sum
on t1t2.id2 = t2sum.id and t2sum.cnt = t1t2.cnt
group by t1t2.id1, t1t2.cnt, t1sum.cnt, t2sum.cnt
I actually tested this in SQL Server by placing this with
clause in front:
with t as (
select 1 as id, 'r' as color union all
select 1, 'g' union all
select 1, 'b' union all
select 2 as id, 'r' as color union all
select 2, 'g' union all
select 2, 'b' union all
select 3, 'r' union all
select 4, 'y' union all
select 4, 'p' union all
select 5 as id, 'r' as color union all
select 5, 'g' union all
select 5, 'b' union all
select 5, 'p'
)
Upvotes: 1