Reputation: 569
I recently noticed a poor performing query that contained a CTE.
From running EXPLAIN
it appears that if I have 2 consecutive tables in a CTE, the first of which has a WHERE
filter applied the postgres optimizer doesn't actually restrict the row set down, and so the second table lookup is very slow:
WITH thing_data AS (
SELECT * FROM things WHERE id = '0000000001'
), thing_readings AS (
SELECT thing_timestamp
FROM reading_log_instantaneous_schedule
INNER JOIN thing_data
ON thing_id = thing_data.id
ORDER BY thing_timestamp DESC LIMIT 1
),
SELECT thing_data.*
FROM thing_data
LEFT OUTER JOIN thing_readings
ON thing_data.id = thing_readings.thing_id
Basically the inner join in the readings table is not benefiting from the INNER JOIN thing_data on thing_id = thing_data.id
, and is actually doing a scan across all the rows in the reading table.
Is it possible to kick the optimizer to notice that I have limited down the thing_data
record set to just one row, therefore making the subsequent join quick, instead of super slow?
Edit: Apologies for a poorly anonymised query.
I have created a SQLFiddle demonstrating the problem I am having - I still have to add 2 WHERE
clauses (bad for code maintainability etc) - the problem persists even when forgetting about CTEs and using a regular join table as Craig suggested. I am more used to SQL Server which did not have this issue when I converted the schema.
http://sqlfiddle.com/#!15/17c82/1
Upvotes: -1
Views: 624
Reputation: 324891
(non-writeable) CTEs are now optimised through like subqueries etc, unless NOT MATERIALIZED
is applied on them. See https://www.postgresql.org/docs/current/queries-with.html . So you would no longer see this behaviour on Pg 12.
Is it possible to kick the optimizer to notice that I have limited down the thing_data record set to just one row, therefore making the subsequent join quick, instead of super slow?
Not in PostgreSQL, at least in 9.4 or older. Hopefully it'll change later.
CTEs in PostgreSQL are optimisation fences - essentially, the planner can't push qualifiers down into them, or pull qualifiers up out of them.
When this is a problem, you need to go back to using a subquery in the FROM
clause, old-style.
SELECT thing_data.*
FROM (
SELECT * FROM things WHERE id = '0000000001'
) data_thing
LEFT OUTER JOIN (
SELECT thing_timestamp
FROM reading_log_instantaneous_schedule
INNER JOIN thing_data on thing_id = thing_data.id
ORDER BY thing_timestamp DESC LIMIT 1)
thing_readings
ON thing_data.id = thing_readings.thing_id;
since a subquery in FROM
does permit qualifier push-down / pull-up. In this case, though, you really want to apply the WHERE
clause laterally. This is best done by further simplifying, getting rid of the subquery:
SELECT thing_data.*
FROM things
LEFT OUTER JOIN (
SELECT thing_timestamp
FROM reading_log_instantaneous_schedule
INNER JOIN thing_data on thing_id = thing_data.id
ORDER BY thing_timestamp DESC LIMIT 1)
thing_readings
ON thing_data.id = thing_readings.thing_id
WHERE things.id = '0000000001'
However, the whole thing appears to be a hugely overcomplicated way of just doing this (based on your SQLFiddle http://sqlfiddle.com/#!15/17c82/3):
SELECT things.*, thingreadings.reading
FROM things
LEFT OUTER JOIN thingreadings ON thingreadings.thingid = things.id
WHERE things.id = '1'
ORDER BY reading DESC LIMIT 1;
Upvotes: 4