Alter Lagos
Alter Lagos

Reputation: 12550

Simplify and/or optimize sql query with INTERSECT or HAVING

I have the following data:

 tag_id | taggable_id 
--------+-------------
      1 |           1
      2 |           1
      3 |           1
      4 |           1
      1 |           2
      2 |           2
      5 |           2
      6 |           2
      7 |           3
      8 |           3
      9 |           3
     10 |           3

And I want to bring all the taggable_ids which are in a group of tag_ids AND in another group AND another... (max 4 groups).
For example:

In a gross way(for the second example), the query is the following:

SELECT taggable_id FROM taggings WHERE tag_id in (1)
INTERSECT
SELECT taggable_id FROM taggings WHERE tag_id in (6)
INTERSECT
SELECT taggable_id FROM taggings WHERE tag_id in (2,8)

I think simplifying it, it looks like:

SELECT taggable_id FROM taggings WHERE tag_id in (1,2,6,8)
GROUP BY taggable_id HAVING COUNT(*)=3

but I'm wondering if can be done in a simpler way. Any thoughts?

Upvotes: 1

Views: 766

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656411

This can be cast as a case of relational division. We have assembled an arsenal of query techniques under this related question:

Depending on data distribution and other factors, this may be fastest:

SELECT DISTINCT taggable_id
FROM   taggings t1
JOIN   taggings t2 USING (taggable_id)
JOIN   taggings t3 USING (taggable_id)
WHERE  t1.tag_id = 1
AND    t2.tag_id = 6
AND    t3.tag_id IN (2, 8);

Assuming unique (tag_id, taggable_id), DISTINCT is actually not needed for the example. But it might be necessary with other (list) predicates.

SQL Fiddle (building on @Clodoaldo's, thanks).

Upvotes: 3

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

SQL Fiddle

Your second query fails if the tuple (8, 2) is inserted. Here is one solution although I don't know if simpler then the intersect one:

select taggable_id
from taggings
where tag_id in (1,2,6,8)
group by taggable_id
having
    array_agg(tag_id) @> array[1,2]
    and
    array_agg(tag_id) && array[6,8]

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269583

Your second query is not equivalent to the first. Change the having clause to:

HAVING SUM(CASE WHEN tag_id = 1 THEN 1 ELSE 0 END) > 0 AND
       SUM(CASE WHEN tag_id = 6 THEN 1 ELSE 0 END) > 0 AND
       SUM(CASE WHEN tag_id IN (2, 8) THEN 1 ELSE 0 END) > 0;

As for performance, test the queries to see which performs better. The advantage of the having approach is you can add more and more complex conditions without having a significant impact on performance.

Upvotes: 1

Related Questions