How PostgreSQL execute query?

Question

Can anyone explain why PostgreSQL works so:

If I execute this query

SELECT
*
FROM project_archive_doc as PAD, project_archive_doc as PAD2
WHERE
PAD.id = PAD2.id

it will be simple JOIN and EXPLAIN will looks like this:

Hash Join  (cost=6.85..13.91 rows=171 width=150)
  Hash Cond: (pad.id = pad2.id)
  ->  Seq Scan on project_archive_doc pad  (cost=0.00..4.71 rows=171 width=75)
  ->  Hash  (cost=4.71..4.71 rows=171 width=75)
        ->  Seq Scan on project_archive_doc pad2  (cost=0.00..4.71 rows=171 width=75)

But if I will execute this query:

SELECT *
FROM project_archive_doc as PAD
WHERE
PAD.id = (
          SELECT PAD2.id
          FROM project_archive_doc as PAD2
          WHERE
          PAD2.project_id = PAD.project_id
          ORDER BY PAD2.created_at
          LIMIT 1)

there will be no joins and EXPLAIN looks like:

Seq Scan on project_archive_doc pad  (cost=0.00..886.22 rows=1 width=75)"
  Filter: (id = (SubPlan 1))
  SubPlan 1
    ->  Limit  (cost=5.15..5.15 rows=1 width=8)
          ->  Sort  (cost=5.15..5.15 rows=1 width=8)
                Sort Key: pad2.created_at
                ->  Seq Scan on project_archive_doc pad2  (cost=0.00..5.14 rows=1 width=8)
                      Filter: (project_id = pad.project_id)

Why it is so and is there any documentation or articles about this?

Craig Ringer · Accepted Answer

Without table definitions and data it's hard to be specific for this case. In general, PostgreSQL is like most SQL databases in that it doesn't treat SQL as a step-by-step program for how to execute a query. It's more like a description of what you want the results to be and a hint about how you want the database to produce those results.

PostgreSQL is free to actually execute the query however it can most efficiently do so, so long as it produces the results you want.

Often it has several choices about how to produce a particular result. It will choose between them based on cost estimates.

It can also "understand" that several different ways of writing a particular query are equivalent, and transform one into another where it's more efficient. For example, it can transform an IN (SELECT ...) into a join, because it can prove they're equivalent.

However, sometimes apparently small changes to a query fundamentally change its meaning, and limit what optimisations/transformations PostgreSQL can make. Adding a LIMIT or OFFSET inside a subquery prevents PostgreSQL from flattening it, i.e. combining it with the outer query by tranforming it into a join. It also prevents PostgreSQL from moving WHERE clause entries between the subquery and outer query, because that'd change the meaning of the query. Without a LIMIT or OFFSET clause, it can do both these things because they don't change the query's meaning.

There's some info on the planner here.

How PostgreSQL execute query?

Answers (1)

Related Questions