khromov
khromov

Reputation: 112

How to avoid filesort in this simple query? (No joins)

I made a table of data points - they have a key (type of data), value (value of data), timestamp (time data was recorded).

Table definition:

CREATE TABLE IF NOT EXISTS datapoints (
  point_id int(11) NOT NULL AUTO_INCREMENT,
  point_user_id int(11) NOT NULL,
  point_key varchar(32) NOT NULL,
  point_value longtext NOT NULL,
  point_timestamp int(11) NOT NULL,
  PRIMARY KEY (point_id),
  KEY datapoint_search (point_key,point_timestamp,point_user_id,point_value(64))
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;

Now I make a query that gets all datapoints of a specific type for a date range (1 day in our example) and orders it by the data point value in descending order:

EXPLAIN SELECT * FROM datapoints
WHERE point_key = 'body_temperature'
      AND point_timestamp >= UNIX_TIMESTAMP('2013-11-20')
      AND point_timestamp < UNIX_TIMESTAMP('2013-11-21')
      AND point_user_id = 1
ORDER BY point_value DESC;

Unfortunately, this produces a filesort in the EXPLAIN:

id | select_type | table      | type  | possible_keys    | key              | key_len | ref  | rows   | Extra                       
 1 | SIMPLE      | datapoints | range | datapoint_search | datapoint_search | 106     | NULL | 175642 | Using where; Using filesort

Is it possible to avoid this filesort? I have made an index (datapoint_search), and it is used, yet a filesort is still invoked.

PS. The point_value column has to be text or longtext, or at least handle very large data (up to 8KB) while still being sortable.

Upvotes: 1

Views: 101

Answers (4)

Jason Heo
Jason Heo

Reputation: 10246

Filesort will not disappear while you are sorting on point_value. point_value is indexed just 64 bytes. sorting is done by whole it's data. I suggest that store point_value_64_prefix for search and sort point_value

this also has a problem. Sort are done only 64 bytes, sort result is not exactly. but in most case 64 bytes is enough (I guess)

CREATE TABLE IF NOT EXISTS datapoints (
  point_id int(11) NOT NULL AUTO_INCREMENT,
  point_user_id int(11) NOT NULL,
  point_key varchar(32) NOT NULL,
  point_value longtext NOT NULL,
  point_value_64_prefix VARCHAR(64) NOT NULL, // <= this column added
  point_timestamp int(11) NOT NULL,
  PRIMARY KEY (point_id),
  KEY datapoint_search (point_key,point_timestamp,point_user_id,point_value_64_preifx) // <= 
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;

EXPLAIN SELECT * FROM datapoints
WHERE point_key = 'body_temperature'
      AND point_timestamp >= UNIX_TIMESTAMP('2013-11-20')
      AND point_timestamp < UNIX_TIMESTAMP('2013-11-21')
      AND point_user_id = 1
ORDER BY point_value_64_prefix DESC // <= sort by point_value_64_prefix rather than original value.

and, if your sorting data is large, Filesort can happen in this case you need to increase MySQL temp table size. see http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html

manual says:

The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269973

The following index satisfies your where clause:

datapoints(point_key, point_user_id, point_timestamp);

This will probably improve the performance of your query significantly, but it won't remove the file sort.

The following could, theoretically:

datapoints(point_key, point_user_id, point_value, point_timestamp);

However, I don't think that MySQL is smart enough to match part of the where clause and the order by, with the remaining filtering done after the sort. It is worth a try.

The following will not work:

datapoints(point_key, point_user_id, point_timestamp, point_value);

The data would be retrieved in timestamp order for satisfying the where clause. The ordering by point_value is secondary to the timestamp.

EDIT:

If the number of rows found by the where is "constant", then the performance should be similar. If you don't have too many matches to point_key, point_user_id, then the following trick might help:

select dp.*
from (SELECT *
      FROM datapoints
      WHERE point_key = 'body_temperature' AND point_user_id = 1
      ORDER BY point_value DESC
     ) dp
where point_timestamp >= UNIX_TIMESTAMP('2013-11-20') AND
      point_timestamp < UNIX_TIMESTAMP('2013-11-21');

Along with the index datapoints(point_key, point_user_id, point_value).

Unfortunately, MySQL does not guarantee that the sort in the inner subquery actually keeps the rows in order for the outer query (I think it does in practice, at least usually). This would use the index for the inner query and then a scan of the temporary table for the second where clause.

Also, if you don't need all the columns, then I would recommend putting the columns you want into the index. This will save the random scans of the full table when there is a match.

Upvotes: 3

Mad Dog Tannen
Mad Dog Tannen

Reputation: 7244

MySQL can use filesort, but it can still be in memory and then the performance is still good.

Bill Karwin gives a good explanation about it here: Any way to avoid a filesort when order by is different to where clause?

Upvotes: 0

jokeeffe
jokeeffe

Reputation: 395

If you're searching for rows for a specific point_value and point_user_id, in a specific date range, as you are in your query, your index should be on (point_key, point_user_id, point_timestamp). Putting point_timestamp second in the index won't help you unless you have large numbers of rows with identical point_timestamp values (which is probably unlikely).

Upvotes: 0

Related Questions