Fabian Zeindl
Fabian Zeindl

Reputation: 5988

Why does an unused FROM table make a statement slow

I have two tables. One changelog-table, the details of which aren't important here. Another table that holds seller-information and most importantly has two columns:

When a seller is changed, a new entry is created, ident stays the same but the new entry get's a new ID. I have a primary index on ID and another index on (ident,-id) so i can get the current data fast.

Per chance i found the following weird behaviour:

This takes veeery long to complete:

SELECT DISTINCT ON (ident) sellers.* FROM changelog, sellers ORDER BY ident,id DESC;

                                  QUERY PLAN                                    
---------------------------------------------------------------------------------
 Unique  (cost=741675.98..760122.47 rows=10 width=30)
   ->  Sort  (cost=741675.98..750899.22 rows=3689298 width=30)
         Sort Key: sellers.ident, sellers.id
         ->  Nested Loop  (cost=3.07..74457.37 rows=3689298 width=30)
               ->  Seq Scan on changelog  (cost=0.00..668.34 rows=38034 width=0)
               ->  Materialize  (cost=3.07..4.04 rows=97 width=30)
                     ->  Seq Scan on sellers  (cost=0.00..2.97 rows=97 width=30)

When i replace the DESC with -ID it's fast, but produces the same results.

SELECT DISTINCT ON (ident) sellers.* FROM changelog, sellers ORDER BY ident,-id;

                                         QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Unique  (cost=706.37..92956.53 rows=10 width=30)
   ->  Nested Loop  (cost=706.37..83733.28 rows=3689298 width=30)
         ->  Index Scan using idx_sellers on sellers  (cost=0.00..17.70 rows=97 width=30)
         ->  Materialize  (cost=706.37..1086.71 rows=38034 width=0)
               ->  Seq Scan on changelog  (cost=0.00..668.34 rows=38034 width=0)

When i remove the "changelog" out of the FROM, ORDER BY -id and DESC give the same query-plan, again fast.

SELECT DISTINCT ON (ident) sellers.* FROM sellers ORDER BY ident,id, DESC   

                             QUERY PLAN                              
---------------------------------------------------------------------
 Unique  (cost=6.17..6.66 rows=10 width=30)
   ->  Sort  (cost=6.17..6.41 rows=97 width=30)
         Sort Key: ident, id
         ->  Seq Scan on sellers  (cost=0.00..2.97 rows=97 width=30)

My questions:

Edit: My real query has of course a WHERE clause to join the two tables.

Upvotes: 2

Views: 184

Answers (1)

HLGEM
HLGEM

Reputation: 96600

There is no such thing as an unused table in a query.

SELECT DISTINCT ON (ident) sellers.* 
FROM changelog, sellers ORDER BY ident,id DESC

You orginal query was creating a cross-join (every record in changelog was being joined to every record in sellers) which is one reason why you should avoid using the implied join syntax that was replaced with explicit syntax 20 years ago.

Upvotes: 9

Related Questions