SQL Select all rows where subset exists

I'm sure there is an answer present for this question but bear with me as I'm new to SQL and am not sure how to ask the question.

I have data like this (this is shorthand purely for example). This is in a postgres db.

table1
id    value
1     111
1     112
1     113
2     111
2     112
2     116
3     111
3     122
3     123
4     126
5     123
5     125
6     111
6     112
6     116

table2
value
111
112
116

I need return the id of table1 where all values in table2 exist in the values of table1. So for this example, my query would return 2 and 6.

Is there any way to do this in SQL? Or could you possibly guide me to a data structure that would allow for me to get this result? I am able to change up the structure of either table to accommodate the ultimate need of getting this result

Thank you so much. An answer to this would be a life saver.

Upvotes: 4

Answers (4)

wildplasser

Reputation: 44250

NOT EXISTS(... NOT EXISTS) is a standard solution to relational division:

SELECT DISTINCT id
FROM table1 t1
WHERE NOT EXISTS (
        SELECT * FROM table2 t2
        WHERE NOT EXISTS (
                SELECT * FROM table1 t1x
                WHERE t1x.value = t2.value
                AND t1x.id = t1.id
                )
        )
        ;

In this case, the DISTINCT is needed because we don't have access to the domain table with ids, only to the junction table t1 referring to it.

Upvotes: 0

onedaywhen

Reputation: 57023

It seems to me that as much as anything you want to know how to ask the right question. The magic words here are "relational division".

It is one of the operators in Codd's relational algebra and there have been several variations proposed since. Most recently, Chris Date has proposed replacing the whole concept with image relations.

SQL has no explicit divide operator. There are a number of workarounds using other operator and the most appropriate will depend on your requirements, including exact division or division with remainder and how to handle an empty divisor. Then there are the usual considerations: SQL product and version, performance, personal style and taste, etc.

Here are a couple of articles which should help you with these choices:

On Making Relational Division Comprehensible

Divided We Stand: The SQL of Relational Division

Upvotes: 3

user731136

Reputation:

UPDATE Another possibility:

SELECT t1.id
FROM (SELECT t1.id, t1.value
      FROM table1 t1
      JOIN  table2 t2 USING (value)
      GROUP BY t1.id, t1.value
      ORDER BY t1.id) t1
GROUP BY t1.id      
HAVING COUNT(*) = (SELECT COUNT(*) FROM table2)

The cost of my answer, if you use EXPLAIN ANALYZE is always 893-900, even with repeated rows.

Upvotes: 1

Erwin Brandstetter

Reputation: 656844

Consider this demo:

CREATE TEMP TABLE table1(id int, value int);
INSERT INTO table1 VALUES
 (1,111),(1,112),(1,113)
,(2,111),(2,112),(2,116)
,(3,111),(3,122),(3,123)
,(4,126)
,(5,123),(5,125)
,(6,111),(6,112),(6,116);

CREATE TEMP TABLE table2(value int);
INSERT INTO table2 VALUES
 (111)
,(112)
,(116);

SELECT t1.id
FROM   table1 t1
JOIN   table2 t2 USING (value)
GROUP  BY t1.id
HAVING count(*) = (SELECT count(*) FROM table2)
ORDER  BY t1.id;

Result:

id
-----
2
6

Returns all ids of table1 that appear with all values provided by table2 once.
Works for any number of rows in both tables.

If duplicate rows can appear in table1 make that:

HAVING count(DISTINCT value) = (SELECT count(*) FROM table2)

Upvotes: 6

SQL Select all rows where subset exists

Answers (4)

Related Questions