Reputation: 14882
What's the most efficient way to select rows that must satisfy two conditions met in the same column?
name | title
------------------
John | Associate
John | Manager
Fran | Manager
Fran | President
I'd like to do something like
select name
from table
where title = 'Associate'
and name in ( select *
from table
where title = 'Manager')
which should return
John
but that seems woefully inefficient, especially if the table itself is super big. You could also do
select name
from table a,
table b
where a.title = 'Associate'
and b.title = 'Manager'
and a.name = b.name
Is that the best one can do?
Upvotes: 1
Views: 8395
Reputation: 10246
It depends on MySQL Version (MySQL 5.6 has query rewrite feature which improves IN()
subquery) and Table Relationships.
There are at least 3 ways to get result you're expecting. In my experience, INNE JOIN
is faster than others in general cases.
Try your self with your data.
please note that in MySQL. IN()
produces dependent sub-query
plan.
SELECT DISTINCT name
FROM table
WHERE title = 'Associate'
AND name IN (SELECT name FROM table WHERE title = 'Manager')
SELECT DISTINCT name
FROM table t1 INNER JOIN table t2
WHERE a.title = 'Associate' AND b.title = 'Manager'
AND t1.name = t2.name
EXISTS
is fast when tables have 1:n relationship. This requires no DISTINCT
, GROUP BY
.
SELECT name
FROM table t1
WHERE a.title = 'Associate'
AND EXISTS (SELECT 1 FROM table t2
WHERE t2.name = t1.name AND t2.title = 'Manager')
Upvotes: 0
Reputation: 35603
Using WHERE EXISTS (or NOT EXISTS) is a very effective alternative for this
select
name
from table1
where title = 'Associate'
and exists (
select 1 /* could be: select NULL as nothing actually needs to be "returned */
from table1 as t2
where t2.title = 'Manager'
and t2.name = table1.name /* correlated here */
)
;
Similar to using IN() it requires a subquery but "correlates" that subquery. However this subquery does NOT need to move any data (IN can require this).
Also, similar to IN() using EXISTS has no impact on the number of result rows. Joins can create unwanted row repetition (but of course sometimes the extra rows are needed).
This reference is for SQL Server, but it compares several relevant methods (ignore Outer Apply - mssql specific), including potential issues dealing with NULLs when using IN() that do not affect EXISTS
| NOT | EXISTS () should be one of the first methods to consider.
Upvotes: 0
Reputation: 48139
I would have an index on your table via ( title, name ), then do a self-join. In this case, I am putting what would be less probable as the outer primary condition of the query where the manager records are considered first... ie. a company may have 5 managers and 100 associates vs looking for 100 associates that match the 5 managers.
select
t.name
from
table t
join table t2
on t2.title = 'Associate'
AND t.name = t2.name
where
t.title = 'Manager'
Upvotes: 1
Reputation: 13765
There's not a whole lot of data given as an example, but I'm assuming both John's here are the same person with multiple titles? If that were the case you would be better off having your titles being a child table of the employees table (if that's what this table is)
So instead you could have:
Employee
----
id
name
titles
----
id
titleName
employeeTitles
----
employeeId
titleId
If you can't do it this way, i would think another way to write your original query would be:
select name
from table t1
inner join (
select distinct name
from table
where title = 'manager'
) t2 on t1.name = t2.name
where title = 'Associate'
could also do group by name rather than distinct. But still, doing the above solution i think would be better all around (assuming my own assumptions are correct about your data)
Upvotes: 0
Reputation: 1270081
Your first query is not syntactically correct. It should be:
select name
from table
where title = 'Associate' and
name in (select name from table where title = 'Manager');
The second is better written as a join
:
select name
from table a join
table b
on a.title = 'Associate' and b.title = 'Manager' and a.name = b.name;
The second is probably better in terms of taking advantage of indexes on the table.
You can also do this with a group by
:
select name
from table a
group by name
having sum(title = 'Associate') > 0 and sum(title = 'Manager') > 0;
MySQL is not very good at optimizing group by
. But if there are no indexes on the table, it might be faster than the join
methods.
Upvotes: 1