jobinbasani
jobinbasani

Reputation: 2141

Correct way to use SQL Intersect

I'm trying to build a query for the following scenario:

I have two tables Table1 and Table2.

The primary keys of Table1 goes like T1Attr1, T1Attr2 and so on.

Corresponding to each primary key in Table1, I can get a set of attributes from Table2 which goes like T2Attr1, T2Attr2 and so on.

I'm trying to query for the attributes which are common to the attributes of Table1, for example, if the input is T1Attr1 and T1Attr2, the results should have the attributes common to both of them from Table2. As the input parameters grows, the results would be less since common-to-all attributes would be less.

My query is similar to this:

Select indId, indName from indData where pId =1

intersect
Select indId, indName from indData where pId =2

intersect
Select indId, indName from indData where pId =3

The query works fine but when the pId list is huge(above 100), jdbc driver throws an error message.

Can someone please provide suggestions on using this query correctly or provide a better approach for the problem?

Thanks!

Upvotes: 0

Views: 119

Answers (2)

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115520

You can use this query but it won't be as efficient as the one you have:

SELECT indId, indName 
FROM indData 
WHERE pId IN (1, 2, 3, ..., 100)
GROUP BY indId, indName
HAVING COUNT(DISTINCT pId) = 100 ;  -- the number of pId you are searching on

You can also use JOINs. Perhaps this will result in a better execution plan and not cause this error. If there is a unique constraint on (indId, pId) this will be equivalent to your query:

SELECT a1.indId, a1.indName 
FROM indData AS a1
  JOIN indData AS a2
    ON a2.indId = a1.indId
  JOIN indData AS a3
    ON a3.indId = a1.indId 
  ...
  JOIN indData AS a100
    ON a100.indId = a1.indId
WHERE a1.pId = 1
  AND a2.pId = 2
  ...
  AND a100.pId = 100 ;

An index on (pId, indID) INCLUDE (indName) would help efficiency.

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269603

Intersect is not the only way to do what you want. Your query is an example of a "set-within-sets" query. The "set" is the indid, indname pair. The "within-sets" are having all three values for pid.

I advocate using aggregation with a having clause for this type of query, because this is a very flexible approach for many types of conditions. In your case, the results query is:

select indid, indname
from indData
group by indid, indname
having SUM(case when pid = 1 then 1 else 0 end) > 0 and
       SUM(case when pid = 2 then 1 else 0 end) > 0 and
       SUM(case when pid = 3 then 1 else 0 end) > 0;

If you have an index on pid and the values are relatively rare, than adding a where pid in (1, 2, 3) could benefit the query performance-wise.

Upvotes: 2

Related Questions