Reputation: 77
I have a query in a function to select the top row and another for the last row, each query takes around 300ms to execute, and this query is executed a lot of times making the function useless
This is the query (this is a test, in the function parameters change):
SELECT the_geom
FROM "Entries"
WHERE taxiid= 366 and timestamp between '2008-02-06 16:00:00' and timestamp '2008-02-06 16:00:00' + interval '5 minutes'
ORDER BY entryid DESC
LIMIT 1;;
and this is the EXPLAIN ANALYZE output of the query:
QUERY PLAN
--------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------
Seq Scan on "Entries" (cost=0.00..63538.80 rows=70 width=51) (actual time=184.409..342.049 rows=56 loops=1)
Filter: (("timestamp" >= '2008-02-06 16:00:00'::timestamp without time zone) AND ("timestamp" <= '2008-02-06 16:05:00'::timestamp without time zone
) AND (taxiid = 366))
Rows Removed by Filter: 2128847
Planning time: 0.191 ms
Execution time: 342.088 ms
(5 rows)
Is there a better way of getting top and last row?
EDIT: Thanks Drunix, that did help but, something that i cant understand is happening, with the index you suges i was able to go from ~300 ms to 0.2 ms
but if i change the time interval that is added to the timestamp to 120 minutes the index is not used and it keeps taking 300 ms
here is the proof(5 minute interval):
snowflake=# explain analyze Select the_geom from "Entries"
where taxiid= 366 and "timestamp" between '2008-02-06 16:00:00' and "timestamp" '2008-02-06 16:00:00' + interval '5 minutes'
ORDER BY entryid ASC
LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------
Limit (cost=149.52..149.52 rows=1 width=55) (actual time=0.129..0.129 rows=1 loops=1)
-> Sort (cost=149.52..149.70 rows=73 width=55) (actual time=0.127..0.127 rows=1 loops=1)
Sort Key: entryid
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using entriesindex on "Entries" (cost=0.43..149.15 rows=73 width=55) (actual time=0.045..0.090 rows=56 loops=1)
Index Cond: ((taxiid = 366) AND ("timestamp" >= '2008-02-06 16:00:00'::timestamp without time zone) AND ("timestamp" <= '2008-02-06 16:
05:00'::timestamp without time zone))
Planning time: 0.266 ms
Execution time: 0.180 ms
(8 rows)
the other one(120 minutes interval):
snowflake=# explain analyze Select the_geom from "Entries"
where taxiid= 366 and "timestamp" between '2008-02-06 16:00:00' and "timestamp" '2008-02-06 16:00:00' + interval '120 minutes'
ORDER BY entryid ASC
LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------
Limit (cost=0.43..60.02 rows=1 width=55) (actual time=245.570..245.570 rows=1 loops=1)
-> Index Scan using "Entries_pkey" on "Entries" (cost=0.43..97542.75 rows=1637 width=55) (actual time=245.568..245.568 rows=1 loops=1)
Filter: (("timestamp" >= '2008-02-06 16:00:00'::timestamp without time zone) AND ("timestamp" <= '2008-02-06 18:00:00'::timestamp without tim
e zone) AND (taxiid = 366))
Rows Removed by Filter: 853963
Planning time: 0.277 ms
Execution time: 245.616 ms
Upvotes: 0
Views: 435
Reputation: 3343
Ok, rephrasing my comment as an answer:
Unless you already have it you should create a composite index:
create index somename on Entries(taxiid, timestamp);
According to your execution plan the combination of these fields should be rather selective, therefore an index scan should be more efficient. Note that an index on (timestamp, taxiid)
is probably much less useful, because it will only be used to limit the row by timestamp. Put the columns that are checked for equality in front in similar cases.
Upvotes: 1