Reputation: 33
https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT
I am currently reading this page to understand EXPLAIN ANALYZE for postgreSQL, and I am trying to understand the relation between the estimation cost and actual time.
A simple example given in this page is as following:
-> Nested Loop (cost=5.64..14.71 rows=1 width=140) (actual time=18.983..19.481 rows=4 loops=1)
-> Hash Join (cost=5.64..8.82 rows=1 width=72) (actual time=18.876..19.212 rows=4 loops=1)
-> Index Scan using pg_class_oid_index on pg_class i (cost=0.00..5.88 rows=1 width=72) (actual time=0.051..0.055 rows=1 loops=4)
and it says " If you do the math, you'll see that 0.055 * 4 accounts for most of the difference between the total time of the hash join and the total time of the nested loop (the remainder is likely the overhead of measuring all of this)."
I am not sure what does the "difference" here stand for and i cannot really find any difference which is close to 0.055 *4.. Am i silly that just ignore some trivial result?
Btw i am actually writing a lab report on databases, so generally if being asked to write down some brief comment on estimated and actual time based on some specific result, what can i say?
this is the plan i need to write result on:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=39911.52..299300.41 rows=1 width=17) (actual time=4660.217..4952.328 rows=1 loops=1)
Join Filter: (casts.mid = movie.id)
Rows Removed by Join Filter: 2251735
-> Seq Scan on movie (cost=0.00..29721.64 rows=5542 width=21) (actual time=0.637..316.651 rows=4201 loops=1)
Filter: (year > 2010)
Rows Removed by Filter: 1533210
-> Materialize (cost=39911.52..269080.01 rows=6 width=4) (actual time=0.307..1.014 rows=536 loops=4201)
-> Hash Join (cost=39911.52..269079.98 rows=6 width=4) (actual time=1288.827..4089.872 rows=536 loops=1)
Hash Cond: (casts.pid = actor.id)
-> Seq Scan on casts (cost=0.00..186246.47 rows=11445847 width=8) (actual time=0.293..1487.138 rows=11445847 loops=1)
-> Hash (cost=39911.51..39911.51 rows=1 width=4) (actual time=414.130..414.130 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on actor (cost=0.00..39911.51 rows=1 width=4) (actual time=100.175..414.125 rows=1 loops=1)
Filter: (((fname)::text = 'Tom'::text) AND ((lname)::text = 'Hanks'::text))
Rows Removed by Filter: 1865033
Total runtime: 4952.822 ms
Upvotes: 3
Views: 2777
Reputation: 36107
Look at actual times:
-> Nested Loop ........ (actual time=18.983..19.481 rows=4 loops=1) ..... ..... -> Hash Join ....... (actual time=18.876..19.212 rows=4 loops=1) -> Index Scan ......... (actual time=0.051..0.055 rows=1 loops=4)
4 (loops) * 0.055 = 0.22
19.212 + 0.22 = 19.432 ==> almost 19.481 (missing 0.049)
EDIT
I think that adding an index on actor( fname + lname )
,
or even only on one column actor( lname )
, can speed up this query dramatically.
Look at this:
-> Seq Scan on actor (cost=0.00..39911.51 rows=1 width=4) (actual time=100.175..414.125 rows=1 loops=1)
Filter: (((fname)::text = 'Tom'::text) AND ((lname)::text = 'Hanks'::text))
Rows Removed by Filter: 1865033
PostgreSQL performs a sequential scan on the actos
table, and filters out 1865033 rows to find only 1 row. The total time of this scan is from 100 to 414 seconds.
One row can be found in a couple of miliseconds when using an index.
Upvotes: 3