Why are common table expressions faster even when limiting the result?

Question

I can't get my head around why the following query:

SELECT ref,
article_number,
count(article_number)
FROM invoice
INNER JOIN goods_list USING (invoice_number)
WHERE invoice_owner = 'someone'
GROUP BY invoice_number, article_number
LIMIT 1

is way slower than this one:

WITH base_data AS (
    SELECT invoice_number
    FROM invoice
    WHERE invoice_owner = 'someone'
    LIMIT 1
)

SELECT invoice_number,
    article_number,
    count(article_number)
FROM base_data
INNER JOIN goods_list USING (invoice_number)
GROUP BY invoice_number, article_number

Is the limit applied after the whole result set is returned?

Gordon Linoff · Accepted Answer

The first query processes all the data for the invoice owner. It does the group by and finally returns one row.

The second query gets one row in the CTE for the invoice owner up front. It joins that single row into another table and then does the aggregation on way fewer rows.

Hence, it is not surprising that the second query is much faster, because it is processing many fewer rows for the aggregation.

Note: when using limit you should also use order by. Otherwise, you can get any matching row when the code runs -- and you might even get different rows on different runs.

Why are common table expressions faster even when limiting the result?

Answers (1)

Related Questions