sdover102
sdover102

Reputation: 124

MySQL Performance issues / slow query with large amounts of data

MySql

I've a query that is taking sometime to load on a table, named impression that has about 57 million rows. Table definition can be found below:

+-----------------+--------------+------+-----+
| Field           | Type         | Null | Key |
+-----------------+--------------+------+-----+
| id              | int(11)      | NO   | PRI |
| data_type       | varchar(16)  | NO   | MUL |
| object_id       | int(11)      | YES  |     |
| user_id         | int(11)      | YES  |     |
| posted          | timestamp    | NO   | MUL |
| lat             | float        | NO   |     |
| lng             | float        | NO   |     |
| region_id       | int(11)      | NO   |     |
+-----------------+--------------+------+-----+

The indexes on the table are:

+------------+------------+----------+--------------+-------------+
| Table      | Non_unique | Key_name | Seq_in_index | Column_name |
+------------+------------+----------+--------------+-------------+
| impression |          0 | PRIMARY  |            1 | id          |
| impression |          1 | posted   |            1 | posted      |
| impression |          1 | oi_dt    |            1 | data_type   |
| impression |          1 | oi_dt    |            2 | object_id   |
+------------+------------+----------+--------------+-------------+

A typical select statement goes something like:

SELECT COUNT(`id`)
FROM `impression`
WHERE 
  posted BETWEEN DATE('2014-01-04') AND DATE('2014-06-01')
  AND `data_type` =  'event'
  AND `object_id` IN ('1', '2', '3', '4', '5', '8', ...)

...and a typical record looks like (in order of the schema above):

'event', 1234, 81, '2014-01-02 00:00:01', 35.3, -75.2, 10

This statement takes approximately 26 seconds to run, which is where the problem lies. Are there any solutions that can be employed here to reduce this time to well below what it is now? Ideally it'd be < 1 second.

I'm open to switching storage solutions / etc... anything that'll help at this point. Your assistance is most appreciated.

Other things possibly worth noting:

Upvotes: 0

Views: 137

Answers (2)

Jeff
Jeff

Reputation: 167

Not sure if this is a viable solution for you, but partitioning may speed it up. I have a similar table for impressions and found the following to help it a lot. I'm querying mostly on the current day though.

ALTER TABLE impression PARTITION BY RANGE(TO_DAYS(posted))(
  PARTITION beforeToday VALUES LESS THAN(735725),
  PARTITION today       VALUES LESS THAN(735726),
  PARTITION future      VALUES LESS THAN MAXVALUE
);

This does incur some maintenance (has to be updated often to get the benefits). If you are looking to query on a broader range, less maintenance would be required I think.

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562280

MySQL usually uses only one index per table in a given query. You have an index on posted and a compound index on data_type, object_id.

You should use EXPLAIN to find out which index your query is currently using. EXPLAIN will also tell you how many rows it estimates it will examine to produce the result set (it might examine many more rows than make it into the final result).

The columns should be in this order:

  1. Columns in equality conditions, for example in your query data_type = 'event'

  2. Columns in range conditions or sorting, but you only get one such column. Subsequent columns that are in range conditions or sorting do not gain any benefit from being added to the index after the first such column. So pick the column that is the most selective, that is, your condition narrows down the search to a smaller subset of the table.

  3. Other columns in your select-list, if you have just a few such columns and you want to get the covering index effect. It's not necessary to add your primary key column if you use InnoDB, because every secondary index automatically includes the primary key column at the right end, even if you don't declare that.

So in your case, you might be better off with an index on data_type, posted. Try it and use EXPLAIN to confirm. It depends on whether the date range you give is more selective than the list of object_id's.

See also my presentation How to Design Indexes, Really.

Upvotes: 4

Related Questions