Netherstorm
Netherstorm

Reputation: 163

Query Plan: Is the order of JOINS important

I want to check if the order of JOINS is important in a SQL Querie for runtime and efficiency.

I'm using PostgreSQL and for my check I used the example world db from MYSQL (https://downloads.mysql.com/docs/world.sql.zip) and wrote these two statements:

Querie 1:

EXPLAIN ANALYSE SELECT * FROM countrylanguage
    JOIN city ON city.countrycode = countrylanguage.countrycode
    JOIN country c ON city.countrycode = c.code

Querie 2:

EXPLAIN ANALYSE SELECT * FROM city
    JOIN country c ON c.code = city.countrycode
    JOIN countrylanguage c2 on c.code = c2.countrycode

Query Plan 1: enter image description here

Hash Join  (cost=41.14..484.78 rows=29946 width=161) (actual time=1.472..17.602 rows=30670 loops=1)
  Hash Cond: (city.countrycode = countrylanguage.countrycode)
  ->  Seq Scan on city  (cost=0.00..72.79 rows=4079 width=31) (actual time=0.062..1.220 rows=4079 loops=1)
  ->  Hash  (cost=28.84..28.84 rows=984 width=130) (actual time=1.378..1.378 rows=984 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 172kB
        ->  Hash Join  (cost=10.38..28.84 rows=984 width=130) (actual time=0.267..0.823 rows=984 loops=1)
              Hash Cond: (countrylanguage.countrycode = c.code)
              ->  Seq Scan on countrylanguage  (cost=0.00..15.84 rows=984 width=17) (actual time=0.029..0.158 rows=984 loops=1)
              ->  Hash  (cost=7.39..7.39 rows=239 width=113) (actual time=0.220..0.220 rows=239 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 44kB
                    ->  Seq Scan on country c  (cost=0.00..7.39 rows=239 width=113) (actual time=0.013..0.137 rows=239 loops=1)
Planning Time: 3.818 ms
Execution Time: 18.801 ms

Query Plan 2: enter image description here

Hash Join  (cost=41.14..312.47 rows=16794 width=161) (actual time=2.415..18.628 rows=30670 loops=1)
  Hash Cond: (city.countrycode = c.code)
  ->  Seq Scan on city  (cost=0.00..72.79 rows=4079 width=31) (actual time=0.032..0.574 rows=4079 loops=1)
  ->  Hash  (cost=28.84..28.84 rows=984 width=130) (actual time=2.364..2.364 rows=984 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 171kB
        ->  Hash Join  (cost=10.38..28.84 rows=984 width=130) (actual time=0.207..1.307 rows=984 loops=1)
              Hash Cond: (c2.countrycode = c.code)
              ->  Seq Scan on countrylanguage c2  (cost=0.00..15.84 rows=984 width=17) (actual time=0.027..0.204 rows=984 loops=1)
              ->  Hash  (cost=7.39..7.39 rows=239 width=113) (actual time=0.163..0.163 rows=239 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 44kB
                    ->  Seq Scan on country c  (cost=0.00..7.39 rows=239 width=113) (actual time=0.015..0.049 rows=239 loops=1)
Planning Time: 1.901 ms
Execution Time: 19.694 ms

The estimated costs and rows are different and the last hash condition is different. Does this mean, that the Query Planner hasn't done the same thing for both queries, or am I on the wrong track?

Thanks for your help!

Upvotes: 1

Views: 129

Answers (2)

wildplasser
wildplasser

Reputation: 44250

  • (as said) the queries are not identical
  • though not identical, the plans are comparable
  • both queries execute in 18ms, comparing them is close to useless
  • queries on tables with insufficient structure(keys,indexes,statistics), but with a small enough footprint (work_mem) will always result in hash-joins.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270301

The issue is not the order of the joins but that the join conditions are different -- referring to different tables.

In the first query, you are joining to countrylanguage using the country code from city. In the second, you are using the country code from country.

With inner joins this should not make a difference to the final result. However, it clearly does affect how the optimizer considers different paths.

Upvotes: 2

Related Questions