Reputation: 2790
My queries get very slow when I add a limit 1
.
I have a table object_values
with timestamped values for objects:
timestamp | objectID | value
--------------------------------
2014-01-27| 234 | ksghdf
Per object I want to get the latest value:
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;
(I cancelled the query after more than 10 minutes)
This query is very slow when there are no values for a given objectID (it is fast if there are results). If I remove the limit it tells me nearly instantaneous that there are no results:
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
...
Time: 0.463 ms
An explain shows me that the query without limit uses the index, where as the query with limit 1
does not make use of the index:
Slow query:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1;
QUERY PLAN`
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2350.44 rows=1 width=126)
-> Index Scan Backward using object_values_timestamp on object_values (cost=0.00..3995743.59 rows=1700 width=126)
Filter: (objectID = 53708)`
Fast query:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Sort (cost=6540.86..6545.11 rows=1700 width=126)
Sort Key: timestamp
-> Index Scan using object_values_objectID on working_hours_t (cost=0.00..6449.65 rows=1700 width=126)
Index Cond: (objectID = 53708)
The table contains 44,884,559 rows and 66,762 distinct objectIDs.
I have separate indexes on both fields: timestamp
and objectID
.
I have done a vacuum analyze
on the table and I have reindexed the table.
Additionally the slow query becomes fast when I set the limit to 3 or higher:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=6471.62..6471.63 rows=3 width=126)
-> Sort (cost=6471.62..6475.87 rows=1700 width=126)
Sort Key: timestamp
-> Index Scan using object_values_objectID on object_values (cost=0.00..6449.65 rows=1700 width=126)
Index Cond: (objectID = 53708)
In general I assume it has to do with the planner making wrong assumptions about the exectution costs and therefore chooses for a slower execution plan.
Is this the real reason? Is there a solution for this?
Upvotes: 77
Views: 43705
Reputation: 1
I can show you my custom method to handle that. This method may not be right way, but works fast. Here is my PostgreSQL function:
CREATE OR REPLACE FUNCTION get_limited_object_values(_object_id BIGINT, _limit INTEGER)
RETURNS SETOF object_values
LANGUAGE plpgsql AS
$$
DECLARE
row_object_value object_values;
index INT := 0;
BEGIN
FOR row_object_value IN (SELECT * FROM object_values WHERE (objectID = _object_id) ORDER BY timestamp DESC) LOOP
IF index >= _limit THEN
EXIT;
END IF;
RETURN NEXT row_object_value;
index := index + 1;
END LOOP;
RETURN;
END;
$$;
You can use it like:
SELECT * FROM get_limited_object_values(53708, 1);
Upvotes: 0
Reputation: 5353
You can avoid this issue by adding an unneeded ORDER BY
clause to the query.
SELECT * FROM object_values
WHERE (objectID = 53708)
ORDER BY timestamp DESC, objectID
limit 1;
Upvotes: 81
Reputation: 78463
You're running into an issue which relates, I think, to the lack of statistics on row correlations. Consider reporting it to pg-bugs for reference if this is using the latest version Postgres.
The interpretation I'd suggest for your plans is:
limit 1
makes Postgres look for a single row, and in doing so it assumes that your object_id is common enough that it'll show up reasonably quickly in an index scan.
Based on the stats you gave its thinking probably is that it'll need to read ~70 rows on average to find one row that fits; it just doesn't realize that object_id and timestamp correlate to the point where it's actually going to read a large portion of the table.
limit 3
, in contrast, makes it realize that it's uncommon enough, so it seriously considers (and ends up…) top-n sorting an expected 1700 rows with the object_id
you want, on grounds that doing so is likely cheaper.
For instance, it might know that the distribution of these rows is so that they're all packed in the same area on the disk.
no limit
clause means it'll fetch the 1700 anyways, so it goes straight for the index on object_id
.
add an index on
(object_id, timestamp)
or(object_id, timestamp desc)
.
Upvotes: 56
Reputation: 66751
Not a fix, but sure enough switching from limit 1
to limit 50
(for me) and returning the first result row is way faster...Postgres 9.x in this instance. Just thought I'd mention it as a workaround mentioned by the OP.
Upvotes: 1
Reputation: 2444
I started having similar symptoms on an update-heavy table, and what was needed in my case was
analyze $table_name;
In this case the statistics needed to be refreshed, which then fixed the slow query plans that were occurring.
Supporting docs: https://www.postgresql.org/docs/current/sql-analyze.html
Upvotes: 14