Reputation: 832
Can anyone explain why PostgreSQL works so:
If I execute this query
SELECT
*
FROM project_archive_doc as PAD, project_archive_doc as PAD2
WHERE
PAD.id = PAD2.id
it will be simple JOIN
and EXPLAIN
will looks like this:
Hash Join (cost=6.85..13.91 rows=171 width=150)
Hash Cond: (pad.id = pad2.id)
-> Seq Scan on project_archive_doc pad (cost=0.00..4.71 rows=171 width=75)
-> Hash (cost=4.71..4.71 rows=171 width=75)
-> Seq Scan on project_archive_doc pad2 (cost=0.00..4.71 rows=171 width=75)
But if I will execute this query:
SELECT *
FROM project_archive_doc as PAD
WHERE
PAD.id = (
SELECT PAD2.id
FROM project_archive_doc as PAD2
WHERE
PAD2.project_id = PAD.project_id
ORDER BY PAD2.created_at
LIMIT 1)
there will be no joins and EXPLAIN
looks like:
Seq Scan on project_archive_doc pad (cost=0.00..886.22 rows=1 width=75)"
Filter: (id = (SubPlan 1))
SubPlan 1
-> Limit (cost=5.15..5.15 rows=1 width=8)
-> Sort (cost=5.15..5.15 rows=1 width=8)
Sort Key: pad2.created_at
-> Seq Scan on project_archive_doc pad2 (cost=0.00..5.14 rows=1 width=8)
Filter: (project_id = pad.project_id)
Why it is so and is there any documentation or articles about this?
Upvotes: 2
Views: 660
Reputation: 324751
Without table definitions and data it's hard to be specific for this case. In general, PostgreSQL is like most SQL databases in that it doesn't treat SQL as a step-by-step program for how to execute a query. It's more like a description of what you want the results to be and a hint about how you want the database to produce those results.
PostgreSQL is free to actually execute the query however it can most efficiently do so, so long as it produces the results you want.
Often it has several choices about how to produce a particular result. It will choose between them based on cost estimates.
It can also "understand" that several different ways of writing a particular query are equivalent, and transform one into another where it's more efficient. For example, it can transform an IN (SELECT ...)
into a join, because it can prove they're equivalent.
However, sometimes apparently small changes to a query fundamentally change its meaning, and limit what optimisations/transformations PostgreSQL can make. Adding a LIMIT
or OFFSET
inside a subquery prevents PostgreSQL from flattening it, i.e. combining it with the outer query by tranforming it into a join. It also prevents PostgreSQL from moving WHERE
clause entries between the subquery and outer query, because that'd change the meaning of the query. Without a LIMIT
or OFFSET
clause, it can do both these things because they don't change the query's meaning.
There's some info on the planner here.
Upvotes: 2