GrahamB
GrahamB

Reputation: 569

CTE optimisations

I recently noticed a poor performing query that contained a CTE.

From running EXPLAIN it appears that if I have 2 consecutive tables in a CTE, the first of which has a WHERE filter applied the postgres optimizer doesn't actually restrict the row set down, and so the second table lookup is very slow:

WITH thing_data AS (
    SELECT * FROM things WHERE id = '0000000001'
), thing_readings AS (
    SELECT  thing_timestamp
    FROM reading_log_instantaneous_schedule
    INNER JOIN thing_data
    ON thing_id = thing_data.id
    ORDER BY thing_timestamp DESC LIMIT 1
),                   
SELECT thing_data.*
FROM thing_data
LEFT OUTER JOIN thing_readings
ON thing_data.id = thing_readings.thing_id 

Basically the inner join in the readings table is not benefiting from the INNER JOIN thing_data on thing_id = thing_data.id, and is actually doing a scan across all the rows in the reading table.

Is it possible to kick the optimizer to notice that I have limited down the thing_data record set to just one row, therefore making the subsequent join quick, instead of super slow?

Edit: Apologies for a poorly anonymised query.

I have created a SQLFiddle demonstrating the problem I am having - I still have to add 2 WHERE clauses (bad for code maintainability etc) - the problem persists even when forgetting about CTEs and using a regular join table as Craig suggested. I am more used to SQL Server which did not have this issue when I converted the schema.

http://sqlfiddle.com/#!15/17c82/1

Upvotes: -1

Views: 624

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324891

Answer for PostgreSQL 12 and above

(non-writeable) CTEs are now optimised through like subqueries etc, unless NOT MATERIALIZED is applied on them. See https://www.postgresql.org/docs/current/queries-with.html . So you would no longer see this behaviour on Pg 12.

Answer for PostgreSQL 11 and below:

Is it possible to kick the optimizer to notice that I have limited down the thing_data record set to just one row, therefore making the subsequent join quick, instead of super slow?

Not in PostgreSQL, at least in 9.4 or older. Hopefully it'll change later.

CTEs in PostgreSQL are optimisation fences - essentially, the planner can't push qualifiers down into them, or pull qualifiers up out of them.

When this is a problem, you need to go back to using a subquery in the FROM clause, old-style.

SELECT thing_data.*
FROM (
    SELECT * FROM things WHERE id = '0000000001'
) data_thing
LEFT OUTER JOIN (
    SELECT  thing_timestamp 
    FROM reading_log_instantaneous_schedule 
    INNER JOIN thing_data on thing_id = thing_data.id 
    ORDER BY thing_timestamp DESC LIMIT 1)
thing_readings
ON thing_data.id = thing_readings.thing_id;

since a subquery in FROM does permit qualifier push-down / pull-up. In this case, though, you really want to apply the WHERE clause laterally. This is best done by further simplifying, getting rid of the subquery:

SELECT thing_data.*
FROM things
LEFT OUTER JOIN (
    SELECT  thing_timestamp 
    FROM reading_log_instantaneous_schedule 
    INNER JOIN thing_data on thing_id = thing_data.id 
    ORDER BY thing_timestamp DESC LIMIT 1)
thing_readings 
ON thing_data.id = thing_readings.thing_id
WHERE things.id = '0000000001'

However, the whole thing appears to be a hugely overcomplicated way of just doing this (based on your SQLFiddle http://sqlfiddle.com/#!15/17c82/3):

SELECT  things.*, thingreadings.reading 
FROM things 
LEFT OUTER JOIN thingreadings ON thingreadings.thingid = things.id
WHERE things.id = '1'
ORDER BY reading DESC LIMIT 1;

Upvotes: 4

Related Questions