Reputation: 14882

Selection SQL where two conditions must be true

What's the most efficient way to select rows that must satisfy two conditions met in the same column?

name | title
------------------
John | Associate
John | Manager
Fran | Manager
Fran | President

I'd like to do something like

select name 
  from table 
 where title = 'Associate' 
   and name in ( select * 
                   from table 
                  where title = 'Manager')

which should return

John

but that seems woefully inefficient, especially if the table itself is super big. You could also do

select name 
  from table a, 
       table b 
 where a.title = 'Associate' 
   and b.title = 'Manager' 
   and a.name = b.name

Is that the best one can do?

Upvotes: 1

Answers (5)

Jason Heo

Reputation: 10246

It depends on MySQL Version (MySQL 5.6 has query rewrite feature which improves IN() subquery) and Table Relationships.

There are at least 3 ways to get result you're expecting. In my experience, INNE JOIN is faster than others in general cases.

Try your self with your data.

IN () - you've wrote firstly.

please note that in MySQL. IN() produces dependent sub-query plan.

SELECT DISTINCT name
FROM table
WHERE title = 'Associate'
  AND name IN (SELECT name FROM table WHERE title = 'Manager')

SELF JOIN - your 2nd

SELECT DISTINCT name
FROM table t1 INNER JOIN table t2
WHERE a.title = 'Associate' AND b.title = 'Manager'
  AND t1.name = t2.name

EXISTS - Semi JOIN

EXISTS is fast when tables have 1:n relationship. This requires no DISTINCT, GROUP BY.

SELECT name
FROM table t1
WHERE a.title = 'Associate'
  AND EXISTS (SELECT 1 FROM table t2
              WHERE t2.name = t1.name AND t2.title = 'Manager')

Upvotes: 0

Paul Maxwell

Reputation: 35603

Using WHERE EXISTS (or NOT EXISTS) is a very effective alternative for this

select
        name
from table1
where title = 'Associate' 
and exists (
            select 1 /* could be: select NULL as nothing actually needs to be "returned */
            from table1 as t2
            where t2.title = 'Manager'
            and t2.name = table1.name /* correlated here */
           )
;

Similar to using IN() it requires a subquery but "correlates" that subquery. However this subquery does NOT need to move any data (IN can require this).

Also, similar to IN() using EXISTS has no impact on the number of result rows. Joins can create unwanted row repetition (but of course sometimes the extra rows are needed).

This reference is for SQL Server, but it compares several relevant methods (ignore Outer Apply - mssql specific), including potential issues dealing with NULLs when using IN() that do not affect EXISTS

| NOT | EXISTS () should be one of the first methods to consider.

Upvotes: 0

DRapp

Reputation: 48139

I would have an index on your table via ( title, name ), then do a self-join. In this case, I am putting what would be less probable as the outer primary condition of the query where the manager records are considered first... ie. a company may have 5 managers and 100 associates vs looking for 100 associates that match the 5 managers.

select
      t.name 
   from 
      table t
         join table t2
            on t2.title = 'Associate' 
           AND t.name = t2.name
   where 
      t.title = 'Manager'

Upvotes: 1

Kritner

Reputation: 13765

There's not a whole lot of data given as an example, but I'm assuming both John's here are the same person with multiple titles? If that were the case you would be better off having your titles being a child table of the employees table (if that's what this table is)

So instead you could have:

Employee
----
id
name

titles
----
id
titleName

employeeTitles
----
employeeId
titleId

If you can't do it this way, i would think another way to write your original query would be:

select name 
from table t1
inner join (
    select distinct name
    from table
    where title = 'manager'
) t2 on t1.name = t2.name
where title = 'Associate'

could also do group by name rather than distinct. But still, doing the above solution i think would be better all around (assuming my own assumptions are correct about your data)

Upvotes: 0

Gordon Linoff

Reputation: 1270081

Your first query is not syntactically correct. It should be:

select name
from table
where title = 'Associate' and
     name in (select name from table where title = 'Manager');

The second is better written as a join:

select name
from table a join
     table b
     on a.title = 'Associate' and b.title = 'Manager' and a.name = b.name;

The second is probably better in terms of taking advantage of indexes on the table.

You can also do this with a group by:

select name
from table a
group by name
having sum(title = 'Associate') > 0 and sum(title = 'Manager') > 0;

MySQL is not very good at optimizing group by. But if there are no indexes on the table, it might be faster than the join methods.

Upvotes: 1

Selection SQL where two conditions must be true

Answers (5)

IN () - you've wrote firstly.

SELF JOIN - your 2nd

EXISTS - Semi JOIN

Related Questions