Adrian P.
Adrian P.

Reputation: 5228

MySQL optimize complicate query NOT IN (2 SELECT WITH JOIN) WHERE

I have the following query:

SELECT driver_id, first_name, last_name
    FROM drivers
    WHERE driver_id NOT IN 
    (SELECT DISTINCT w.driver_id from waybills w 
            JOIN drivers d ON d.driver_id = w.driver_id 
            WHERE  w.waybill_owner = 1 
            AND w.waybill_status = 'dispatched'
            AND w.delivery_date = '2014-10-28')
    AND driver_id NOT IN 
    (SELECT DISTINCT wm.driver_id from waybill_movements wm 
            JOIN drivers d ON d.driver_id = wm.driver_id 
            WHERE  wm.movement_owner = 1 
            AND wm.delivery_date = '2014-10-28')
    AND status = 'active'
    AND driver_owner = 1
    ORDER BY last_name ASC

How can I optimize this query?

Query works well and return expected results but my question is if the query can be optimized.

Many thanks for your time and help.

Update:

And, yes, I have these indexes:

waybills(waybill_owner, waybill_status, w.delivery_date) waybill_movements (wm.movement_owner, delivery_date) drivers(driver_id a primary key and drivers(status, driver_owner)

The table structure is not need it for optimization

I didn't expect to have so many answers. Thank you all.

Upvotes: 2

Views: 59

Answers (4)

DRapp
DRapp

Reputation: 48139

I have typically found that doing left-joins and looking for NULL of the table result easier and also great utilization of the indexes that can be made.

The drivers table I would have an index ON (driver_owner, status, driver_id)
Your waybill table, index ON(waybill_owner, driver_id, delivery_date, waybill_status)
waybill_movements, index ON(movement_owner, driver_id, delivery_date )


SELECT 
      d1.driver_id, 
      d1.first_name, 
      d1.last_name
   FROM 
      drivers d1
         LEFT JOIN waybills w
            ON d1.driver_id = w.driver_id
            AND d1.driver_owner = w.waybill_owner
            AND w.waybill_status = 'dispatched'
            AND w.delivery_date = '2014-10-28'
         LEFT JOIN waybill_movements wm 
            ON d1.driver_id = wm.driver_id 
            AND d1.driver_owner = wm.movement_owner
            AND wm.delivery_date = '2014-10-28'
   where
          d1.driver_owner = 1
      AND d1.status = 'active'
      AND w.driver_ID IS NULL
      AND wm.driver_id IS NULL
   ORDER BY 
      d1.last_name ASC

By including the "IS NULL" for the waybill and waybill_movements, you are only getting those that DO NOT have a matching record.

Also, seeing your waybill_owner = 1... is that coincidence to that of the driver_owner = 1 also? IF the waybill owner should always be the same as the driver owner, then I would change the index ON the waybill table to have waybill_owner first, then driver_id, then the rest AND change the joins to be based on the drivers.driver_owner = waybills.waybill_owner (similarly on waybill_movements)

REVISED per feedback

Revised the indexes and the query to join based on the same drivers.driver_owner table to the waybill tables to prevent false results of cross-owner matches by whatever strange possibility may exist.

Upvotes: 1

JNevill
JNevill

Reputation: 50034

It depends on what the definition of is is. You may want to rewrite those WHERE IN subqueries into your FROM clause and then run your existing query against that, head to head, and see if MySQL explain gives a different execution path and stats for that.

No doubt it will since you pretty much have to hit your drivers table three times in your existing query in order to get to the results you want. This is totally unnecessary overhead.

SELECT d.driver_id, d.first_name, d.last_name
FROM drivers d
    LEFT OUTER JOIN waybills w ON
        d.driver_id = w.driver_id  AND
        w.waybill_owner = 1 AND
        w.waybill_status = 'dispatched' AND
        w.delivery_date = '2014-10-28'
    LEFT OUTER JOIN waybill_movements wm ON
        d.driver_id = wm.driver_id AND
        wm.movement_owner = 1 AND
        wm.delivery_date = '2014-10-28')
WHERE
    w.driver_id IS NULL AND
    wm.driver_id IS NULL AND
    d.status = 'active' AND
    d.driver_owner = 1
ORDER BY last_name ASC  

Your ORDER BY is also expensive, as they always are. If it is not needed, then it might be a good idea to remove it.

You may want to run @StuartLC's suggested SQL and this one through explain and see MySQL likes either one of them better. You may find better results with his, but sometimes it's a coin flip. MySQL is generally not terribly speedy about derived tables, so cramming as much into a traditional join as possible can yield better results. But it totally depends on the size of the tables, whether or not they have proper indexes and all that fun stuff.

Upvotes: 2

StuartLC
StuartLC

Reputation: 107247

Assuming that you have already looked at the indexing

  • waybills(waybill_owner, waybill_status, w.delivery_date)
  • waybill_movements (wm.movement_owner, delivery_date)
  • drivers(driver_id) - presumably a primary key? and possibly also drivers(status, driver_owner)

Two other improvements spring to mind

  • Neither of the subqueries actually use the Joined driver table. Assuming this is intentional, you can either remove the join (if the intention is not to filter waybills / movements which don't have ANY driver at all), or just filter on where driver_id IS NOT NULL if this this is a nullable foreign key
  • If you union the two subqueries, you can then do a single NOT IN

SELECT driver_id, first_name, last_name
    FROM drivers
    WHERE driver_id NOT IN 
    (
      SELECT DISTINCT w.driver_id
         from waybills w 
            WHERE  w.waybill_owner = 1 
            AND w.waybill_status = 'dispatched'
            AND w.delivery_date = '2014-10-28'
      UNION
      SELECT DISTINCT wm.driver_id 
        from waybill_movements wm 
            WHERE  wm.movement_owner = 1 
            AND wm.delivery_date = '2014-10-28'
    )
    AND status = 'active'
    AND driver_owner = 1
    ORDER BY last_name ASC

Upvotes: 1

MIvanIsten
MIvanIsten

Reputation: 490

SELECT d.driver_id, d.first_name, d.last_name 
FROM drivers AS d
LEFT JOIN waybills AS w 
    ON d.driver_id = w.driver_id 
    AND w.waybill_owner = 1
    AND w.waybill_status = 'dispatched'
    AND w.delivery_date = '2014-10-28'
LEFT JOIN waybill_movements AS wm 
    ON d.driver_id = wm.driver_id
    AND wm.movement_owner = 1 
    AND wm.delivery_date = '2014-10-28'
WHERE w.driver_id IS NULL 
AND wm.driver_id IS NULL
AND d.status = 'active'
AND d.driver_owner = 1
ORDER BY d.last_name ASC

Upvotes: 0

Related Questions