Nishant123
Nishant123

Reputation: 1966

MySQL shows "possible_keys" but does not use it

I have a table with more than a million entries and around 42 columns. I am trying to run SELECT query on this table which takes a minute to execute. In order to reduce the query execution time I added an index on the table, but the index is not being used.

The table structure is as follows. Though the table has 42 columns I am only showing here those that are relevant to my query

CREATE TABLE `tas_usage` (
  `uid` int(11) NOT NULL AUTO_INCREMENT,
  `userid` varchar(255) DEFAULT NULL,
  `companyid` varchar(255) DEFAULT NULL,
  `SERVICE` varchar(2000) DEFAULT NULL,
  `runstatus` varchar(255) DEFAULT NULL,
  `STATUS` varchar(2000) DEFAULT NULL,
  `servertime` datetime DEFAULT NULL,
  `machineId` varchar(2000) DEFAULT NULL,
  PRIMARY KEY (`uid`)
) ENGINE=InnoDB AUTO_INCREMENT=2992891 DEFAULT CHARSET=latin1

The index that I have added is as follows

ALTER TABLE TAS_USAGE ADD INDEX last_quarter (SERVERTIME,COMPANYID(20),MACHINEID(20),SERVICE(50),RUNSTATUS(10));

My SELECT Query

EXPLAIN SELECT DISTINCT t1.COMPANYID, t1.USERID, t1.MACHINEID FROM TAS_USAGE t1 
LEFT JOIN TAS_INVALID_COMPANY INVL ON INVL.COMPANYID = t1.COMPANYID
LEFT JOIN TAS_INVALID_MACHINE INVL_MAC_ID ON INVL_MAC_ID.MACHINEID = t1.MACHINEID
WHERE t1.SERVERTIME >= '2018-10-01 00:00:00' AND t1.SERVERTIME <= '2018-12-31 00:00:00' AND 
INVL.companyId IS NULL AND INVL_MAC_ID.machineId IS NULL AND 
t1.SERVICE NOT IN ('credentialtest%', 'webupdate%') AND  
t1.RUNSTATUS NOT IN ('Failed', 'Failed Failed', 'Failed Success', 'Success Failed', '');

EXPLAIN result is as follows

+----+-------------+-------------+------------+--------+-----------------------+-----------------------+---------+-----------------------------+---------+----------+------------------------------------------------+
| id | select_type | table       | partitions | type   | possible_keys         | key                   | key_len | ref                         | rows    | filtered | Extra                                          |
+----+-------------+-------------+------------+--------+-----------------------+-----------------------+---------+-----------------------------+---------+----------+------------------------------------------------+
|  1 | SIMPLE      | t1          | NULL       | ALL    | last_quarter          | NULL                  | NULL    | NULL                        | 1765296 |    15.68 | Using where; Using temporary                   |
|  1 | SIMPLE      | INVL        | NULL       | ref    | invalid_company_index | invalid_company_index | 502     | servicerunprod.t1.companyid |       1 |   100.00 | Using where; Not exists; Using index; Distinct |
|  1 | SIMPLE      | INVL_MAC_ID | NULL       | eq_ref | machineId             | machineId             | 502     | servicerunprod.t1.machineId |       1 |   100.00 | Using where; Not exists; Using index; Distinct |
+----+-------------+-------------+------------+--------+-----------------------+-----------------------+---------+-----------------------------+---------+----------+------------------------------------------------+

Explanation of my Query

I want to select all the records from table TAS_USAGE

  1. which are between date range(including) 1st October 2018 and 31st Dec 2018 AND
  2. which do not have columns COMPANYID and MACHINEID matching in tables TAS_INVALID_COMPANYand TAS_INVALID_MACHINE AND
  3. which do not contain values ('credentialtest%', 'webupdate%') in SERVICE column and values ('Failed', 'Failed Failed', 'Failed Success', 'Success Failed', '') in RUNSTATUS column

Upvotes: 0

Views: 2594

Answers (3)

Rick James
Rick James

Reputation: 142298

   WHERE  t1.SERVERTIME >= '2018-10-01 00:00:00'
     AND  t1.SERVERTIME <= '2018-12-31 00:00:00'

is strange. It covers 3 months minus 1 day plus 1 second. Suggest you rephrase thus:

   WHERE  t1.SERVERTIME >= '2018-10-01'
     AND  t1.SERVERTIME  < '2018-10-01' + INTERVAL 3 MONTH

There are multiple possible reasons why the INDEX(servertime, ...) was not used and/or was not "useful" even if used:

  • If more than perhaps 20% of the table involved that daterange, using the index is likely to be less efficient than simply scanning the table. Using the index would involve bouncing between the index's BTree and the data's BTree.
  • Starting an index with a 'range' means that the rest of the index will not be used.
  • Index "prefixing" (foo(10)) is next to useless.

What you can do:

  • Normalize most of those string columns. How many "machines" do you have? Probably nowhere near 3 million. By replacing repeated strings with a small id (perhaps a 2-byte SMALLINT UNSIGNED with a max of 65K) will save a lot of space in this table. This, in turn, will speed up the query, and eliminate the desire for index prefixing.
  • If Normalizing is not practical because there really are upwards of 3 million distinct values, then see if shortening the VARCHAR. If you get it under 255, prefixing is no longer needed.
  • NOT IN is not optimizable. If you can invert the test and make it IN(...), more possibilities open up, such as INDEX(service, runstatus, servertime). If you have a new enough version of MySQL, I think the optimizer will hop around in the index on the two IN columns and use the index for the time range.
  • NOT IN ('credentialtest%', 'webupdate%') -- Is % part of the string? If you are using % as a wildcard, that construct will not work. You would need two LIKE clauses.

Reformulate the query thus:

SELECT   t1.COMPANYID, t1.USERID, t1.MACHINEID
    FROM  TAS_USAGE t1
    WHERE  t1.SERVERTIME >= '2018-10-01'
      AND  t1.SERVERTIME  < '2018-10-01' + INTERVAL 3 MONTH
      AND  t1.SERVICE NOT IN ('credentialtest%', 'webupdate%')
      AND  t1.RUNSTATUS NOT IN ('Failed', 'Failed Failed',
                                'Failed Success', 'Success Failed', '')
      AND NOT EXISTS( SELECT 1 FROM  TAS_INVALID_COMPANY WHERE companyId = t1.COMPANYID )
      AND NOT EXISTS( SELECT 1 FROM  TAS_INVALID_MACHINE WHERE MACHINEID = t1.MACHINEID );

If the trio t1.COMPANYID, t1.USERID, t1.MACHINEID is unique, then get rid of DISTINCT.

Since there are only 6 (of 42) columns being used in this query, building a "covering" index will probably help:

INDEX(SERVERTIME, SERVICE, RUNSTATUS, COMPANYID, USERID, MACHINEID)

This is because the query can be performed entirely withing the index. In this case, I deliberately put the range first.

Upvotes: 1

Solarflare
Solarflare

Reputation: 11106

Focussing on the date range, MySQL basically has two options :

  1. read the complete table consecutively and throw away records that do not fit the date range

  2. use the index to identify the records in the date range and then look up each record in the table (using the primary key) individually ("random access")

Consecutive reads are significantly faster than random access, but you need to read more data. There will be some break-even point at which using an index will become slower than just simply reading everything, and MySQL assumes this is the case here. If that's the right choice will largely depend on how correctly it guessed how many records are actually in the range. If you make the range smaller, it should actually use the index at some point.

If you know that (or want to test if) using the index is faster, you can force MySQL to use it with

... FROM TAS_USAGE t1 force index (last_quarter) LEFT JOIN ...

You should test it with different ranges, and if you generate your query dynamically, only force the index when you are decently certain (as MySQL will not correct you if you e.g. specify a range that would include all rows).

There is one important way around the slow random access to the table, although it unfortunately does not work with your prefixed index, but I mention it in case you can reduce your field sizes (or change them to lookups/enums). You can include every column that MySQL needs to evaluate the query by using a covering index:

An index that includes all the columns retrieved by a query. Instead of using the index values as pointers to find the full table rows, the query returns values from the index structure, saving disk I/O.

As mentioned, since in a prefixed index, part of the data is missing, those columns unfortunately cannot be used to cover though.

Actually, they also cannot be used for much at all, especially not to filter records before doing the random access, as to evaluate your where-condition for RUNSTATUS or SERVICE, the complete value is required anyway. So you could check if e.g. RUNSTATUS is very significant - maybe 99% of your records are in status 'Failed' - and in that case add an unprefixed filter for just (SERVERTIME, RUNSTATUS) (and MySQL might even pick that index then on its own).

Upvotes: 1

Shadow
Shadow

Reputation: 34231

The distinct clause is the one that interferes with the index usage. Since the index cannot be used to help with the distinct, mysql decided against the use of index completely.

If you rearrange the order of fields in the select list, in the index, and in the where clause, mysql may decide to use it:

ALTER TABLE TAS_USAGE ADD INDEX last_quarter (COMPANYID(20),MACHINEID(20), SERVERTIME, SERVICE(50),RUNSTATUS(10));


SELECT DISTINCT t1.COMPANYID, t1.MACHINEID, t1.USERID  FROM TAS_USAGE t1 
    LEFT JOIN TAS_INVALID_COMPANY INVL ON INVL.COMPANYID = t1.COMPANYID
    LEFT JOIN TAS_INVALID_MACHINE INVL_MAC_ID ON INVL_MAC_ID.MACHINEID = t1.MACHINEID
    WHERE 
    INVL.companyId IS NULL AND INVL_MAC_ID.machineId IS NULL AND 
    t1.SERVERTIME >= '2018-10-01 00:00:00' AND t1.SERVERTIME <= '2018-12-31 00:00:00' AND
    t1.SERVICE NOT IN ('credentialtest%', 'webupdate%') AND  
    t1.RUNSTATUS NOT IN ('Failed', 'Failed Failed', 'Failed Success', 'Success Failed', '');

This way COMPANYID, MACHINEID fields become the leftmost fields in the distinct, where, and index - although the prefix may result in the index still to be discarded. You may want to consider reducing your varchar(255) fields.

Upvotes: 0

Related Questions