Bjoern
Bjoern

Reputation: 16304

mysql performance improvements for sorted query in a large table

Table structure:

CREATE TABLE `mytable` (
  `id` varchar(8) NOT NULL,
  `event` varchar(32) NOT NULL,
  `event_date` date NOT NULL,
  `event_time` time NOT NULL,
  KEY `id` (`id`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8

The data in this table looks like this:

 id      | event      | event_date  | event_time
---------+------------+-------------+-------------
ref1     | someevent1 | 2010-01-01  | 01:23:45
ref1     | someevent2 | 2010-01-01  | 02:34:54
ref1     | someevent3 | 2010-01-18  | 01:23:45
ref2     | someevent4 | 2012-10-05  | 22:23:21
ref2     | someevent5 | 2012-11-21  | 11:22:33

The table contains about 500.000.000 records similar to this.

The query I'd like to ask about here looks like this:

SELECT     *
FROM       `mytable`
WHERE      `id` = 'ref1'
ORDER BY   event_date DESC,
           event_time DESC
LIMIT      0, 500

The EXPLAIN output looks like:

select_type:   SIMPLE
table:         E
type:          ref
possible_keys: id
key:           id
key_len:       27
ref:           const     
rows:          17024 (a common example)
Extra:         Using where; Using filesort

Purpose: This query is generated by a website, the LIMIT-values are for page navigation element, so if the user wants to see older entries, they'll get adjusted to 500, 500, then 1000, 500 and so on.

Since some items in the field id can be set in quite a lot of rows, more and more rows will of course lead to a slower query. Profiling those slow queries showed me the reason is the sorting, most of the time during the query the mysql server is busy sorting the data. Indexing the fields event_date and event_time didn't change that very much.

Example SHOW PROFILE Result, sorted by duration:

state          | duration/sec | percentage
---------------|--------------|-----------
Sorting result |     12.00145 |   99.80640
Sending data   |      0.01978 |    0.16449
statistics     |      0.00289 |    0.02403
freeing items  |      0.00028 |    0.00233
...
Total          |     12.02473 |  100.00000

Now the question:

Before delving way deeper into the mysql variables like sort_buffer_size and other server configuration option, can you think of any way to change the query or the sorting behaviour so sorting ain't that big performance eater anymore and the purpose of this query is still in place?

I don't mind a bit of out-of-the-box-thinking.

Thank you in advance!

Upvotes: 1

Views: 179

Answers (3)

sufleR
sufleR

Reputation: 2973

As I wrote in comment multi-column index (id, evet_date desc, event_time desc) may help.

If this table will grow fast you should consider to adding option in application for user to select data for particular date range.

Example: First step always return 500 records but to select next records user should set date range for data and then set pagination.

Upvotes: 2

Neville Kuyt
Neville Kuyt

Reputation: 29629

I would start by doing what sufleR suggests - the multi-column index on (id, event_date desc, event_time desc).

However, according to http://dev.mysql.com/doc/refman/5.0/en/create-index.html, the DESC keyword is supported, but doesn't actually do anything. That's a bit of a pain - so try it, and see if it improves the performance, but it probably won't.

If that's the case, you may have to cheat by creating a "sort_column", with an automatically decrementing value (pretty sure you'd have to do this in the application layer, I don't think you can decrement in MySQL), and add that column to the index.

You'd end up with:

id      | event      | event_date  | event_time  | sort_value
---------+------------+-------------+-------------------------
ref1     | someevent1 | 2010-01-01  | 01:23:45   | 0
ref1     | someevent2 | 2010-01-01  | 02:34:54   | -1
ref1     | someevent3 | 2010-01-18  | 01:23:45   | -2
ref2     | someevent4 | 2012-10-05  | 22:23:21   | -3
ref2     | someevent5 | 2012-11-21  | 11:22:33   | -4

and and index on ID and sort_value.

Dirty, but the only other suggestion is to reduce the number of records matching the where clause in other ways - for instance, by changing the interface not to return 500 records, but records for a given date.

Upvotes: 1

histocrat
histocrat

Reputation: 2381

Indexing is most likely the solution; you just have to do it right. See the mysql reference page for this.

The most effective way to do it is to create a three-part index on (id, event_date, event_time). You can specify event_date desc, event_time desc in the index, but I don't think it's necessary.

Upvotes: 1

Related Questions