Mathijs de Jong
Mathijs de Jong

Reputation: 183

Optimizing slow MySQL select query

EDIT: After looking at some of the answers here and hours of research, my team came to the conclusion there was most likely no way to optimize this further than the 4.5 seconds we were able to achieve (unless maybe with partitioning on offers_clicks, but that would have some ugly side-effects). Eventually, after lots of brainstorming, we decided to split both queries, create two sets of user ids (one from users table and one from offers_clicks), and compare them with set in Python. The set of ids from users table is still pulled from SQL, but we decided to move offers_clicks to Lucene and also added some caching on top of it, so that's where the other set of ids is now pulled from. The end result is that its down to about half a second with cache and 0.9s without cache.

Start of original post: I have trouble getting a query optimized. The first version of the query is fine, but the moment offers_clicks is joined in the 2nd query, the query becomes rather slow. Users table contains 10 million rows, offers_clicks contains 53 million rows.

Acceptable performance:

SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
  AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)

Bad:

SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
  AND users.last_active > '2015-02-26'
  AND offers_clicks.user_id = users.id
  AND offers_clicks.date > '2015-02-14'
  AND offers_clicks.ranking_score < 3.49
  AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)

Here's how it looks without specificying any indexes (even worse):

SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
  AND users.last_active > '2015-02-26'
  AND offers_clicks.user_id = users.id
  AND offers_clicks.date > '2015-02-14'
  AND offers_clicks.ranking_score < 3.49
  AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)

Explain:

explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table         | type  | possible_keys | key       | key_len | ref                          | rows   | Extra                    |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
|  1 | SIMPLE      | users         | range | country_2     | country_2 | 14      | NULL                         | 245014 | Using where; Using index |
|  1 | SIMPLE      | offers_clicks | ref   | user_id_3     | user_id_3 | 4       | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+

Explain without specifying any indexes:

mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table         | type  | possible_keys                                                          | key       | key_len | ref                          | rows   | Extra                    |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
|  1 | SIMPLE      | users         | range | PRIMARY,last_active,country,last_active_2,country_2                    | country_2 | 14      | NULL                         | 221606 | Using where; Using index |
|  1 | SIMPLE      | offers_clicks | ref   | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4       | dejong_pointstoshop.users.id |      3 | Using where              |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+

Here's a whole bunch of indexes I tried with not too much success:

+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name                    | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks |          1 | user_id_3                   |            1 | user_id         | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_3                   |            2 | ranking_score   | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_3                   |            3 | date            | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_2                   |            1 | user_id         | A         |    17838712 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_2                   |            2 | date            | A         |    53516137 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_4                   |            1 | user_id         | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_4                   |            2 | date            | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| offers_clicks |          1 | user_id_4                   |            3 | ranking_score   | A         |         198 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | country_2                   |            1 | country         | A         |          14 |     NULL | NULL   |      | BTREE      |         |               |
| users         |          1 | country_2                   |            2 | last_active     | A         |     8048529 |     NULL | NULL   |      | BTREE      |         |               |

Simplified users schema:

+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field                           | Type          | Null | Key | Default             | Extra          |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id                              | int(11)       | NO   | PRI | NULL                | auto_increment |
| country                         | char(2)       | NO   | MUL |                     |                |
| last_active                     | datetime      | NO   | MUL | 2000-01-01 00:00:00 |                |

Simplified offers clicks schema:

+-----------------+------------------+------+-----+---------------------+----------------+
| Field           | Type             | Null | Key | Default             | Extra          |
+-----------------+------------------+------+-----+---------------------+----------------+
| id              | int(11)          | NO   | PRI | NULL                | auto_increment |
| user_id         | int(11)          | NO   | MUL | 0                   |                |
| offer_id        | int(11) unsigned | NO   | MUL | NULL                |                |
| date            | datetime         | NO   | MUL | 0000-00-00 00:00:00 |                |
| ranking_score   | decimal(5,2)     | NO   | MUL | 0.00                |                |

Upvotes: 6

Views: 437

Answers (6)

Bohemian
Bohemian

Reputation: 425033

Try this:

SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
    ON offers_clicks.user_id = users.id
    AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
    AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE

Make sure there are indexes on users(country, last_active, id) and offers_clicks(user_id, ranking_score, date) and USE them.

Let me know how it performs and if it works I'll explain why.

Upvotes: 0

cn0047
cn0047

Reputation: 17071

First of all i also think that you should use join, and try to join only rows that you really need in result.
As for table offers_clicks i think you should not use index user_id_3 and use user_id_2 because the cardinality of user_id_2 is higher than cardinality of user_id_3 (accordingly to your indexes) and it should be faster.

SELECT
    count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
    ON  offers_clicks.user_id = users.id
    AND offers_clicks.date > '2015-02-14'
    AND offers_clicks.ranking_score < 3.49
    AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;

For this query you don't need altering table, that's why i think you can try it.
Maybe will be helpful to try decrease date range, and as result to decrease rows count in result, it should be faster.

Not sure that i will be helpful...

Upvotes: 0

Ergo
Ergo

Reputation: 1234

Try doing this other way around:

SELECT COUNT(users.id)
    FROM users, offers_clicks
    WHERE users.country = 'US'
        AND users.last_active > '2015-02-26'
        AND offers_clicks.user_id = users.id
        AND offers_clicks.date > '2015-02-14'
        AND offers_clicks.ranking_score < 3.49
        AND offers_clicks.ranking_score > 0.24;

Upvotes: 0

Alex
Alex

Reputation: 17289

SELECT count(users.id) AS count_1 
FROM users 
INNER JOIN
  (SELECT
    DISTINCT user_id
  FROM
    offers_clicks
  WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24
  ) as clicks
ON clicks.user_id  = users.id
WHERE users.country IN ('US') 
    AND users.last_active > '2015-02-26' 

could you provide sqlfiddle with some data please?

and could you tell me what is execution time for this query:

SELECT
    DISTINCT user_id
  FROM
    offers_clicks
  WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24

EDIT QUESTION How long takes this one?

SELECT
    DISTINCT user_id
  FROM
    offers_clicks USE INDEX (user_id_4)
  WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24

Upvotes: 0

DBHash.com
DBHash.com

Reputation: 584

SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
     ON oc.user_id = u.id
WHERE 
    u.country IN ('US') 
    AND u.last_active > '2015-02-26' 
    AND oc.date > '2015-02-14' 
    AND oc.ranking_score > 0.24 
    AND oc.ranking_score < 3.49;

Make sure you have index on users - (id,last_active,country) columns and offers_clicks - (user_id,date,ranking_score)

Or you can reverse the order

SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc 
STRAIGHT_JOIN users u
     ON oc.user_id = u.id
WHERE 
    u.country IN ('US') 
    AND u.last_active > '2015-02-26' 
    AND oc.date > '2015-02-14' 
    AND oc.ranking_score > 0.24 
    AND oc.ranking_score < 3.49;

Make sure you have index on offers_clicks - (user_id) column and users - (id,last_active,country)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

This is your query:

SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
     users u
     ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
      oc.date > '2015-02-14' AND
      oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;

First, instead of count(distinct), you might consider writing the query as:

SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
      EXISTS (SELECT 1
              FROM offers_clicks oc
              WHERE oc.user_id = u.id AND
                    oc.date > '2015-02-14' AND
                    oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
             )

Then, the best indexes for this query are: users(country, last_active, id) and either offers_clicks(user_id, date, ranking_score) or offers_clicks(user_id, ranking_score, date).

Upvotes: 5

Related Questions