Reputation: 163
I want to check if the order of JOINS is important in a SQL Querie for runtime and efficiency.
I'm using PostgreSQL and for my check I used the example world db from MYSQL (https://downloads.mysql.com/docs/world.sql.zip) and wrote these two statements:
Querie 1:
EXPLAIN ANALYSE SELECT * FROM countrylanguage
JOIN city ON city.countrycode = countrylanguage.countrycode
JOIN country c ON city.countrycode = c.code
Querie 2:
EXPLAIN ANALYSE SELECT * FROM city
JOIN country c ON c.code = city.countrycode
JOIN countrylanguage c2 on c.code = c2.countrycode
Hash Join (cost=41.14..484.78 rows=29946 width=161) (actual time=1.472..17.602 rows=30670 loops=1)
Hash Cond: (city.countrycode = countrylanguage.countrycode)
-> Seq Scan on city (cost=0.00..72.79 rows=4079 width=31) (actual time=0.062..1.220 rows=4079 loops=1)
-> Hash (cost=28.84..28.84 rows=984 width=130) (actual time=1.378..1.378 rows=984 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 172kB
-> Hash Join (cost=10.38..28.84 rows=984 width=130) (actual time=0.267..0.823 rows=984 loops=1)
Hash Cond: (countrylanguage.countrycode = c.code)
-> Seq Scan on countrylanguage (cost=0.00..15.84 rows=984 width=17) (actual time=0.029..0.158 rows=984 loops=1)
-> Hash (cost=7.39..7.39 rows=239 width=113) (actual time=0.220..0.220 rows=239 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on country c (cost=0.00..7.39 rows=239 width=113) (actual time=0.013..0.137 rows=239 loops=1)
Planning Time: 3.818 ms
Execution Time: 18.801 ms
Hash Join (cost=41.14..312.47 rows=16794 width=161) (actual time=2.415..18.628 rows=30670 loops=1)
Hash Cond: (city.countrycode = c.code)
-> Seq Scan on city (cost=0.00..72.79 rows=4079 width=31) (actual time=0.032..0.574 rows=4079 loops=1)
-> Hash (cost=28.84..28.84 rows=984 width=130) (actual time=2.364..2.364 rows=984 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 171kB
-> Hash Join (cost=10.38..28.84 rows=984 width=130) (actual time=0.207..1.307 rows=984 loops=1)
Hash Cond: (c2.countrycode = c.code)
-> Seq Scan on countrylanguage c2 (cost=0.00..15.84 rows=984 width=17) (actual time=0.027..0.204 rows=984 loops=1)
-> Hash (cost=7.39..7.39 rows=239 width=113) (actual time=0.163..0.163 rows=239 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on country c (cost=0.00..7.39 rows=239 width=113) (actual time=0.015..0.049 rows=239 loops=1)
Planning Time: 1.901 ms
Execution Time: 19.694 ms
The estimated costs and rows are different and the last hash condition is different. Does this mean, that the Query Planner hasn't done the same thing for both queries, or am I on the wrong track?
Thanks for your help!
Upvotes: 1
Views: 129
Reputation: 44250
Upvotes: 2
Reputation: 1270301
The issue is not the order of the join
s but that the join
conditions are different -- referring to different tables.
In the first query, you are joining to countrylanguage
using the country code from city
. In the second, you are using the country code from country
.
With inner joins this should not make a difference to the final result. However, it clearly does affect how the optimizer considers different paths.
Upvotes: 2