Alex Eberly
Alex Eberly

Reputation: 25

Optimizing these nested IN statements by using JOIN in MySQL

I'm writing a query to exclude a certain group of employees from a table. Let's say I have Table 1, transaction_details, that has information that I want to select. Table 2, employee_task_associations, maps each employee to a task that they were assigned to on a particular day. It has a field for the employee_id, and a field called map_id which is the id of the different tasks. This table is an associative table so that Tables 1 and 3 can have a many-to-many relationship. Table 3, employee_tasks, has a list of all tasks that an employee can have.

I have written this query, which is functional, but not optimized:

SELECT someInfo FROM transaction_details TD
WHERE TD.employee_id NOT IN
  (SELECT employee_id from employee_task_associations ETA
    WHERE map_id IN
      (SELECT id FROM employee_tasks ET
       WHERE ET.taskName = "The task I want to exclude"))

While this works, it will run multiple queries. I want to speed things up by replacing my nested NOT IN and IN statements with JOINS.

I know that I can replace the bottom four lines with the following:

SELECT employee_id FROM employee_task_assocations ETA
  LEFT OUTER JOIN employee_tasks ET
    ON ETA.map_id = ET.id
    WHERE ET.taskName = "The task I want to exclude"

This will return a list of all ids of the employees that have had this task. I want to exclude these from my SELECT statement from transaction_details by using a JOIN instead of a subquery. I have tried using a LEFT OUTER JOIN WHERE ETA.id IS NULL, but this does not work. How can I use a JOIN to exclude certain employees in this case?

Upvotes: 0

Views: 30

Answers (2)

The Impaler
The Impaler

Reputation: 48810

You seem to think that an outer join is more performant than a subquery, but that's not true. It all depends on the SQL planner, SQL optimizer, existing indexes, table stats, and definitively on the repertoire of data operators your database engine offers.

Also, you should consider that after parsing, the query enters the transformation phase where the database engine is free to rewrite your query in a more efficient way. This means that behind the scenes your query may actually be executed using outer joins, even if you write subqueries. After that the [rewritten] query enters the query planner, and later the SQL optimizer.

The only way of optimizing it is to get the execution plan of all query options and compare them.

Upvotes: 0

DRapp
DRapp

Reputation: 48139

You appear to be close on your initial query, but why not have the NOT IN query joined and get distinct employees... Something like

SELECT 
      TD.someInfo 
   FROM 
      transaction_details TD
   WHERE 
      TD.employee_id NOT IN
         (SELECT DISTINCT
                employee_id 
             from 
                employee_task_associations ETA
                   JOIN employee_tasks ET
                      ON ETA.map_id = ET.ID
                      AND ET.taskName = "The task I want to exclude")

Upvotes: 1

Related Questions