Reputation: 55
I'm trying to optimize a query from a table with 3M rows.
The columns are value
, datetime
and point_id
.
SELECT DATE(datetime), MAX(value) FROM historical_points WHERE point_id=1 GROUP BY DATE(datetime);
This query takes 2 seconds.
I tried indexing the point_id=1
but the results were not much better.
Is it possible to index the MAX
query or is there a better way to do it? Maybe with an INNER JOIN?
EDIT: This is the explain analyze of similar one, that is tackling the case better. This one also ha performance problem.
EXPLAIN ANALYZE SELECT DATE(datetime), MAX(value), MIN(value) FROM buildings_hispoint WHERE point_id=64 AND datetime BETWEEN '2017-09-01 00:00:00' AND '2017-10-01 00:00:00' GROUP BY DATE(datetime);
>GroupAggregate (cost=84766.65..92710.99 rows=336803 width=68) (actual time=1461.060..2701.145 rows=21 loops=1) > Group Key: (date(datetime)) > -> Sort (cost=84766.65..85700.23 rows=373430 width=14) (actual time=1408.445..1547.929 rows=523621 loops=1) > Sort Key: (date(datetime)) > Sort Method: external sort Disk: 11944kB > -> Bitmap Heap Scan on buildings_hispoint (cost=10476.02..43820.81 rows=373430 width=14) (actual time=148.970..731.154 rows=523621 loops=1) > Recheck Cond: (point_id = 64) > Filter: ((datetime >= '2017-09-01 00:00:00+02'::timestamp with time zone) AND (datetime Rows Removed by Filter: 35712 > Heap Blocks: exact=14422 > -> Bitmap Index Scan on buildings_measurementdatapoint_ffb10c68 (cost=0.00..10382.67 rows=561898 width=0) (actual time=125.150..125.150 rows=559333 loops=1) > Index Cond: (point_id = 64) >Planning time: 0.284 ms >Execution time: 2704.566 ms
Upvotes: 1
Views: 2158
Reputation: 19320
SELECT DISTINCT ON (DATE(datetime)) DATE(datetime), value
FROM historical_points WHERE point_id=1
ORDER BY DATE(datetime) DESC, value DESC;
Put an computed index on DATE(datetime), value
. [I hope those aren't your real column names. Using reserved words like VALUE
as a column name is a recipe for confusion.]
The SELECT DISTINCT
will work like a GROUP ON
. The ORDER BY
replaces the MAX
, and will be fast if indexed.
I owe this technique to @ErwinBrandstetter.
Upvotes: 1
Reputation: 11115
Without seeing EXPLAIN
output is difficult to say something. My guess is that you must include DATE()
call on index definition:
CREATE INDEX historical_points_idx ON historical_points (DATE(datetime), point_id);
Also, if point_id
has more distinct values than DATE(datetime)
then you must reverse column order:
CREATE INDEX historical_points_idx ON historical_points (point_id, DATE(datetime));
Keep in mind that cardinality of columns is very important to the planner, columns with high selectivity is preferred to go first.
Upvotes: 1