Reputation: 183
EDIT: After looking at some of the answers here and hours of research, my team came to the conclusion there was most likely no way to optimize this further than the 4.5 seconds we were able to achieve (unless maybe with partitioning on offers_clicks, but that would have some ugly side-effects). Eventually, after lots of brainstorming, we decided to split both queries, create two sets of user ids (one from users table and one from offers_clicks), and compare them with set in Python. The set of ids from users table is still pulled from SQL, but we decided to move offers_clicks to Lucene and also added some caching on top of it, so that's where the other set of ids is now pulled from. The end result is that its down to about half a second with cache and 0.9s without cache.
Start of original post: I have trouble getting a query optimized. The first version of the query is fine, but the moment offers_clicks is joined in the 2nd query, the query becomes rather slow. Users table contains 10 million rows, offers_clicks contains 53 million rows.
Acceptable performance:
SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)
Bad:
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)
Here's how it looks without specificying any indexes (even worse):
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)
Explain:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
Explain without specifying any indexes:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
Here's a whole bunch of indexes I tried with not too much success:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
Simplified users schema:
+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| country | char(2) | NO | MUL | | |
| last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
Simplified offers clicks schema:
+-----------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | 0 | |
| offer_id | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| ranking_score | decimal(5,2) | NO | MUL | 0.00 | |
Upvotes: 6
Views: 437
Reputation: 425033
Try this:
SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
ON offers_clicks.user_id = users.id
AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE
Make sure there are indexes on users(country, last_active, id)
and offers_clicks(user_id, ranking_score, date)
and USE
them.
Let me know how it performs and if it works I'll explain why.
Upvotes: 0
Reputation: 17071
First of all i also think that you should use join, and try to join only rows that you really need in result.
As for table offers_clicks i think you should not use index user_id_3 and use user_id_2
because the cardinality of user_id_2 is higher than cardinality of user_id_3 (accordingly to your indexes)
and it should be faster.
SELECT
count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
ON offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;
For this query you don't need altering table, that's why i think you can try it.
Maybe will be helpful to try decrease date range, and as result to decrease rows count in result, it should be faster.
Not sure that i will be helpful...
Upvotes: 0
Reputation: 1234
Try doing this other way around:
SELECT COUNT(users.id)
FROM users, offers_clicks
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
Upvotes: 0
Reputation: 17289
SELECT count(users.id) AS count_1
FROM users
INNER JOIN
(SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
) as clicks
ON clicks.user_id = users.id
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
could you provide sqlfiddle with some data please?
and could you tell me what is execution time for this query:
SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
EDIT QUESTION How long takes this one?
SELECT
DISTINCT user_id
FROM
offers_clicks USE INDEX (user_id_4)
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
Upvotes: 0
Reputation: 584
SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
Make sure you have index on users - (id
,last_active
,country
) columns
and offers_clicks - (user_id
,date
,ranking_score
)
Or you can reverse the order
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc
STRAIGHT_JOIN users u
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
Make sure you have index on offers_clicks - (user_id
) column
and users - (id
,last_active
,country
)
Upvotes: 1
Reputation: 1269773
This is your query:
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
users u
ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;
First, instead of count(distinct)
, you might consider writing the query as:
SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
EXISTS (SELECT 1
FROM offers_clicks oc
WHERE oc.user_id = u.id AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
)
Then, the best indexes for this query are: users(country, last_active, id)
and either offers_clicks(user_id, date, ranking_score)
or offers_clicks(user_id, ranking_score, date)
.
Upvotes: 5