well actually
well actually

Reputation: 12370

Optimizing join on derived table - EXPLAIN different on local and server

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":

SELECT
  SQL_CALC_FOUND_ROWS
  DISTINCT p.parcel_number,
  p.street_number,
  p.street_name,
  p.site_address_city_state,
  p.number_of_units,
  p.number_of_stories,
  p.bedrooms,
  p.bathrooms,
  p.lot_area_sqft,
  p.cost_per_sq_ft,
  p.year_built,
  p.sales_date,
  p.sales_price,
  p.id
  FROM (
    SELECT APN, property_case_detail_id FROM property_inspection AS pi
      GROUP BY APN, property_case_detail_id
      HAVING 
      COUNT(IF(status='Resolved Date', 1, NULL)) = 0
    ) as open_cases
  JOIN property AS p
  ON p.parcel_number = open_cases.APN
  LIMIT 0, 1000;

mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id    | User        | Host      | db           | Command | Time | State                | Info                                                                                                 |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query   |   21 | Copying to tmp table | SELECT
          SQL_CALC_FOUND_ROWS
          DISTINCT p.parcel_number,
          p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query   |    0 | NULL                 | show processlist                                                                                     |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:

Explain (local):

+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref                          | rows    | filtered | Extra                           |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
|  1 | PRIMARY     | p          | NULL       | ALL  | NULL          | NULL        | NULL    | NULL                         |   40319 |   100.00 | Using temporary                 |
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 8       | lead_housing.p.parcel_number |      40 |   100.00 | NULL                            |
|  2 | DERIVED     | pi         | NULL       | ALL  | NULL          | NULL        | NULL    | NULL                         | 1623978 |   100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+

Explain (server):

+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra                                    |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
|  1 | PRIMARY     | p          | ALL  | NULL          | NULL | NULL    | NULL |   41369 | Using temporary                          |
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |  122948 | Using where; Distinct; Using join buffer |
|  2 | DERIVED     | pi         | ALL  | NULL          | NULL | NULL    | NULL | 1718586 | Using temporary; Using filesort          |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+

Schemas:

mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field                   | Type         | Null | Key | Default           | Extra                       |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id                      | int(11)      | NO   | PRI | NULL              | auto_increment              |
| lblCaseNo               | int(11)      | NO   | MUL | NULL              |                             |
| APN                     | bigint(10)   | NO   | MUL | NULL              |                             |
| date                    | varchar(50)  | NO   |     | NULL              |                             |
| status                  | varchar(500) | NO   |     | NULL              |                             |
| property_case_detail_id | int(11)      | YES  | MUL | NULL              |                             |
| case_type_id            | int(11)      | YES  | MUL | NULL              |                             |
| date_modified           | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status           | tinyint(1)   | YES  |     | 1                 |                             |
| created_date            | datetime     | NO   |     | NULL              |                             |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)

mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field                      | Type         | Null | Key | Default           | Extra                       |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id                         | int(11)      | NO   | PRI | NULL              | auto_increment              |
| parcel_number              | bigint(10)   | NO   |     | 0                  |                             |
| date_modified              | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date               | datetime     | NO   |     | NULL              |                             |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+

Variables that might be relevant:

tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592

Any ideas on how to optimize this, and any idea why the explains are so different?

Upvotes: 2

Views: 396

Answers (2)

Rick James
Rick James

Reputation: 142296

Since this is where the Optimizers are quite different, let's try to optimize

SELECT APN, property_case_detail_id FROM property_inspection AS pi
  GROUP BY APN, property_case_detail_id
  HAVING 
  COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases

Give this a try:

SELECT ...
    FROM property AS p
    WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
                 WHERE status = 'Resolved Date'
                   AND p.parcel_number = APN )
    ORDER BY ???  -- without this, the `LIMIT` is unpredictable
    LIMIT 0, 1000;

or...

SELECT ...
    FROM property AS p
    LEFT JOIN  property_inspection AS pi  ON p.parcel_number = pi.APN
    WHERE pi.status = 'Resolved Date'
      AND pi.APN IS NULL
    ORDER BY ???  -- without this, the `LIMIT` is unpredictable
    LIMIT 0, 1000;

Index:

property_inspection:  INDEX(status, parcel_number) -- in either order

Upvotes: 1

fifonik
fifonik

Reputation: 1606

MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.

You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.

Your sub-query is the issue. - Server tries to process 1.6M rows with no index and grouping everything. - Having is quite expensive operation so you'd better avoid it, expecially in sub-queries. - Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists

Based on the information provided I'd recommend: - Alter table property_inspection to reduce length of status column. - Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order). - Change query to something like this:

SELECT
    SQL_CALC_FOUND_ROWS
    DISTINCT p.parcel_number,
    ...
    p.id
FROM
    property_inspection AS `pi1`
    INNER JOIN property AS p ON (
        p.parcel_number = `pi1`.APN
    )
    LEFT JOIN (
        SELECT
              `pi2`.property_case_detail_id
            , `pi2`. APN
        FROM
            property_inspection AS `pi2`
        WHERE
            `status` = 'Resolved Date'
    ) AS exclude ON (
        exclude.APN = `pi1`.APN
        AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
    )
WHERE
    exclude.APN IS NULL
LIMIT
    0, 1000;

Upvotes: 1

Related Questions