Why does an unused FROM table make a statement slow

Question

I have two tables. One changelog-table, the details of which aren't important here. Another table that holds seller-information and most importantly has two columns:

a primary ID
and a id called ident (= the realworld ID).

When a seller is changed, a new entry is created, ident stays the same but the new entry get's a new ID. I have a primary index on ID and another index on (ident,-id) so i can get the current data fast.

Per chance i found the following weird behaviour:

This takes veeery long to complete:

SELECT DISTINCT ON (ident) sellers.* FROM changelog, sellers ORDER BY ident,id DESC;

                                  QUERY PLAN                                    
---------------------------------------------------------------------------------
 Unique  (cost=741675.98..760122.47 rows=10 width=30)
   ->  Sort  (cost=741675.98..750899.22 rows=3689298 width=30)
         Sort Key: sellers.ident, sellers.id
         ->  Nested Loop  (cost=3.07..74457.37 rows=3689298 width=30)
               ->  Seq Scan on changelog  (cost=0.00..668.34 rows=38034 width=0)
               ->  Materialize  (cost=3.07..4.04 rows=97 width=30)
                     ->  Seq Scan on sellers  (cost=0.00..2.97 rows=97 width=30)

When i replace the DESC with -ID it's fast, but produces the same results.

SELECT DISTINCT ON (ident) sellers.* FROM changelog, sellers ORDER BY ident,-id;

                                         QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Unique  (cost=706.37..92956.53 rows=10 width=30)
   ->  Nested Loop  (cost=706.37..83733.28 rows=3689298 width=30)
         ->  Index Scan using idx_sellers on sellers  (cost=0.00..17.70 rows=97 width=30)
         ->  Materialize  (cost=706.37..1086.71 rows=38034 width=0)
               ->  Seq Scan on changelog  (cost=0.00..668.34 rows=38034 width=0)

When i remove the "changelog" out of the FROM, ORDER BY -id and DESC give the same query-plan, again fast.

SELECT DISTINCT ON (ident) sellers.* FROM sellers ORDER BY ident,id, DESC   

                             QUERY PLAN                              
---------------------------------------------------------------------
 Unique  (cost=6.17..6.66 rows=10 width=30)
   ->  Sort  (cost=6.17..6.41 rows=97 width=30)
         Sort Key: ident, id
         ->  Seq Scan on sellers  (cost=0.00..2.97 rows=97 width=30)

My questions:

Why does including a unused table in the FROM affect the query at all?
Why does ORDER BY ident,-id not use the same plan as ORDER BY ident,id DESC?

Edit: My real query has of course a WHERE clause to join the two tables.

HLGEM · Accepted Answer

There is no such thing as an unused table in a query.

SELECT DISTINCT ON (ident) sellers.* 
FROM changelog, sellers ORDER BY ident,id DESC

You orginal query was creating a cross-join (every record in changelog was being joined to every record in sellers) which is one reason why you should avoid using the implied join syntax that was replaced with explicit syntax 20 years ago.

Why does an unused FROM table make a statement slow

Answers (1)

Related Questions