Reputation: 112
I made a table of data points - they have a key (type of data), value (value of data), timestamp (time data was recorded).
Table definition:
CREATE TABLE IF NOT EXISTS datapoints (
point_id int(11) NOT NULL AUTO_INCREMENT,
point_user_id int(11) NOT NULL,
point_key varchar(32) NOT NULL,
point_value longtext NOT NULL,
point_timestamp int(11) NOT NULL,
PRIMARY KEY (point_id),
KEY datapoint_search (point_key,point_timestamp,point_user_id,point_value(64))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;
Now I make a query that gets all datapoints of a specific type for a date range (1 day in our example) and orders it by the data point value in descending order:
EXPLAIN SELECT * FROM datapoints
WHERE point_key = 'body_temperature'
AND point_timestamp >= UNIX_TIMESTAMP('2013-11-20')
AND point_timestamp < UNIX_TIMESTAMP('2013-11-21')
AND point_user_id = 1
ORDER BY point_value DESC;
Unfortunately, this produces a filesort in the EXPLAIN:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | datapoints | range | datapoint_search | datapoint_search | 106 | NULL | 175642 | Using where; Using filesort
Is it possible to avoid this filesort? I have made an index (datapoint_search), and it is used, yet a filesort is still invoked.
PS. The point_value column has to be text or longtext, or at least handle very large data (up to 8KB) while still being sortable.
Upvotes: 1
Views: 101
Reputation: 10246
Filesort
will not disappear while you are sorting on point_value.
point_value
is indexed just 64 bytes. sorting is done by whole it's data.
I suggest that store point_value_64_prefix
for search and sort point_value
this also has a problem. Sort are done only 64 bytes, sort result is not exactly. but in most case 64 bytes is enough (I guess)
CREATE TABLE IF NOT EXISTS datapoints (
point_id int(11) NOT NULL AUTO_INCREMENT,
point_user_id int(11) NOT NULL,
point_key varchar(32) NOT NULL,
point_value longtext NOT NULL,
point_value_64_prefix VARCHAR(64) NOT NULL, // <= this column added
point_timestamp int(11) NOT NULL,
PRIMARY KEY (point_id),
KEY datapoint_search (point_key,point_timestamp,point_user_id,point_value_64_preifx) // <=
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;
EXPLAIN SELECT * FROM datapoints
WHERE point_key = 'body_temperature'
AND point_timestamp >= UNIX_TIMESTAMP('2013-11-20')
AND point_timestamp < UNIX_TIMESTAMP('2013-11-21')
AND point_user_id = 1
ORDER BY point_value_64_prefix DESC // <= sort by point_value_64_prefix rather than original value.
and, if your sorting data is large, Filesort
can happen in this case you need to increase MySQL temp table size. see http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
manual says:
The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values
Upvotes: 1
Reputation: 1269973
The following index satisfies your where clause:
datapoints(point_key, point_user_id, point_timestamp);
This will probably improve the performance of your query significantly, but it won't remove the file sort.
The following could, theoretically:
datapoints(point_key, point_user_id, point_value, point_timestamp);
However, I don't think that MySQL is smart enough to match part of the where
clause and the order by
, with the remaining filtering done after the sort. It is worth a try.
The following will not work:
datapoints(point_key, point_user_id, point_timestamp, point_value);
The data would be retrieved in timestamp order for satisfying the where
clause. The ordering by point_value
is secondary to the timestamp.
EDIT:
If the number of rows found by the where
is "constant", then the performance should be similar. If you don't have too many matches to point_key
, point_user_id
, then the following trick might help:
select dp.*
from (SELECT *
FROM datapoints
WHERE point_key = 'body_temperature' AND point_user_id = 1
ORDER BY point_value DESC
) dp
where point_timestamp >= UNIX_TIMESTAMP('2013-11-20') AND
point_timestamp < UNIX_TIMESTAMP('2013-11-21');
Along with the index datapoints(point_key, point_user_id, point_value)
.
Unfortunately, MySQL does not guarantee that the sort in the inner subquery actually keeps the rows in order for the outer query (I think it does in practice, at least usually). This would use the index for the inner query and then a scan of the temporary table for the second where
clause.
Also, if you don't need all the columns, then I would recommend putting the columns you want into the index. This will save the random scans of the full table when there is a match.
Upvotes: 3
Reputation: 7244
MySQL can use filesort, but it can still be in memory and then the performance is still good.
Bill Karwin gives a good explanation about it here: Any way to avoid a filesort when order by is different to where clause?
Upvotes: 0
Reputation: 395
If you're searching for rows for a specific point_value and point_user_id, in a specific date range, as you are in your query, your index should be on (point_key, point_user_id, point_timestamp). Putting point_timestamp second in the index won't help you unless you have large numbers of rows with identical point_timestamp values (which is probably unlikely).
Upvotes: 0