Rams
Rams

Reputation: 2189

MySql not picking correct index for few queries

I'm running follwing query on the table, I'm changing values in the where condition, while running in one case it's taking one index and another case taking it's another(wrong??) index.

row count for query 1 is 402954 it's taking approx 1.5 sec

row count for query 2 is 52097 it's taking approx 35 sec

Both queries query 1 and query 2 are same , only I'm changing values in the where condition

query 1

EXPLAIN SELECT 
     log_type,count(DISTINCT subscriber_id) AS distinct_count,
     count(subscriber_id) as total_count 
FROM campaign_logs 
WHERE 
    domain = 'xxx' AND 
    campaign_id='123' AND 
    log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND 
    log_time BETWEEN 
       CONVERT_TZ('2015-02-12 00:00:00','+05:30','+00:00') AND
       CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00') 
GROUP BY log_type;

EXPLAIN of above query

+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
| id | select_type | table         | type  | possible_keys                                                                                        | key                                     | key_len | ref  | rows   | Extra       |
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
|  1 | SIMPLE      | campaign_logs | range | campaign_id_index,domain_index,log_type_index,log_time_index,campaignid_domain_logtype_logtime_index | campaignid_domain_logtype_logtime_index | 468     | NULL | 402954 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+

query 2

EXPLAIN SELECT 
    log_type,count(DISTINCT subscriber_id) AS distinct_count,
    count(subscriber_id) as total_count 
FROM stats.campaign_logs 
WHERE 
    domain = 'yyy' AND 
    campaign_id='345' AND 
    log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND 
    log_time BETWEEN 
         CONVERT_TZ('2014-02-05 00:00:00','+05:30','+00:00') AND
         CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00') 
GROUP BY log_type;

explain of above query

+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| id | select_type | table         | type        | possible_keys                                                                                        | key                            | key_len | ref  | rows  | Extra                                                                        |
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
|  1 | SIMPLE      | campaign_logs | index_merge | campaign_id_index,domain_index,log_type_index,log_time_index,campaignid_domain_logtype_logtime_index | campaign_id_index,domain_index | 153,153 | NULL | 52097 | Using intersect(campaign_id_index,domain_index); Using where; Using filesort |
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+

Query 1 is using correct index because I have composite index

Query 2 is using index merge , it's taking long time to execute

Why MySql using different indexes for same query

I know we can mention USE INDEX in the query , but why MySql is not picking correct index in this case??. am I doing anything wrong??

Upvotes: 1

Views: 2968

Answers (2)

Marcus Adams
Marcus Adams

Reputation: 53880

No, you're not doing anything wrong.

As Chipmonkey stated in comments, sometimes MySQL will choose the wrong execution plan because of outdated table statistics. You can update the table statistics by performing ANALYZE TABLE.

Still, MySQL optimizer isn't that sophisticated. It sees that in both cases, MySQL will have to visit both the secondary index and then perform a lookup to the clustered index to get the actual table data, so when it saw that perhaps the second query had better selectivity by using the two separate indexes and merging them, you can't blame it too much just because it guessed wrong.

I'm guessing that if you had a covering index so that MySQL could perform the entire query with just the index, it will favor that index over performing a merge.

Try adding subscriber_id to the end of your multi-column index to get a covering index.

Otherwise, use USE INDEX or FORCE INDEX, because that's what they're there for. You know more about the data than MySQL does.

Upvotes: 4

O. Jones
O. Jones

Reputation: 108839

I suggest you try this:

Add this permutation of your compound index.

 (campaign_id,domain,log_time,log_type,subscriber_id)

Change your query to remove the WHERE log_type IN() criterion, thus allowing the aggregate function to use all the records it finds in the range scan on log_time. Including subscriber_id in the index should allow the whole query to be satisfied directly from the index. That is, this is a covering index.

Finally, you can filter on your log_type values by wrapping the whole query in

  SELECT *
    FROM (/*the whole query*/) x
   WHERE log_type IN 
        ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED')
   ORDER BY log_type

This should give you better, and more predictable, performance.

(Unless the log_types you want are a tiny subset of the records, in which case please ignore this suggestion.)

Upvotes: 1

Related Questions