Reputation: 124
MySql
I've a query that is taking sometime to load on a table, named impression
that
has about 57 million rows. Table definition can be found below:
+-----------------+--------------+------+-----+
| Field | Type | Null | Key |
+-----------------+--------------+------+-----+
| id | int(11) | NO | PRI |
| data_type | varchar(16) | NO | MUL |
| object_id | int(11) | YES | |
| user_id | int(11) | YES | |
| posted | timestamp | NO | MUL |
| lat | float | NO | |
| lng | float | NO | |
| region_id | int(11) | NO | |
+-----------------+--------------+------+-----+
The indexes on the table are:
+------------+------------+----------+--------------+-------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
+------------+------------+----------+--------------+-------------+
| impression | 0 | PRIMARY | 1 | id |
| impression | 1 | posted | 1 | posted |
| impression | 1 | oi_dt | 1 | data_type |
| impression | 1 | oi_dt | 2 | object_id |
+------------+------------+----------+--------------+-------------+
A typical select statement goes something like:
SELECT COUNT(`id`)
FROM `impression`
WHERE
posted BETWEEN DATE('2014-01-04') AND DATE('2014-06-01')
AND `data_type` = 'event'
AND `object_id` IN ('1', '2', '3', '4', '5', '8', ...)
...and a typical record looks like (in order of the schema above):
'event', 1234, 81, '2014-01-02 00:00:01', 35.3, -75.2, 10
This statement takes approximately 26 seconds to run, which is where the problem lies. Are there any solutions that can be employed here to reduce this time to well below what it is now? Ideally it'd be < 1 second.
I'm open to switching storage solutions / etc... anything that'll help at this point. Your assistance is most appreciated.
Other things possibly worth noting:
Upvotes: 0
Views: 137
Reputation: 167
Not sure if this is a viable solution for you, but partitioning may speed it up. I have a similar table for impressions and found the following to help it a lot. I'm querying mostly on the current day though.
ALTER TABLE impression PARTITION BY RANGE(TO_DAYS(posted))(
PARTITION beforeToday VALUES LESS THAN(735725),
PARTITION today VALUES LESS THAN(735726),
PARTITION future VALUES LESS THAN MAXVALUE
);
This does incur some maintenance (has to be updated often to get the benefits). If you are looking to query on a broader range, less maintenance would be required I think.
Upvotes: 0
Reputation: 562280
MySQL usually uses only one index per table in a given query. You have an index on posted
and a compound index on data_type
, object_id
.
You should use EXPLAIN to find out which index your query is currently using. EXPLAIN will also tell you how many rows it estimates it will examine to produce the result set (it might examine many more rows than make it into the final result).
The columns should be in this order:
Columns in equality conditions, for example in your query data_type = 'event'
Columns in range conditions or sorting, but you only get one such column. Subsequent columns that are in range conditions or sorting do not gain any benefit from being added to the index after the first such column. So pick the column that is the most selective, that is, your condition narrows down the search to a smaller subset of the table.
Other columns in your select-list, if you have just a few such columns and you want to get the covering index effect. It's not necessary to add your primary key column if you use InnoDB, because every secondary index automatically includes the primary key column at the right end, even if you don't declare that.
So in your case, you might be better off with an index on data_type
, posted
. Try it and use EXPLAIN to confirm. It depends on whether the date range you give is more selective than the list of object_id's.
See also my presentation How to Design Indexes, Really.
Upvotes: 4