Scary Wombat
Scary Wombat

Reputation: 44814

explain analyze - cost to actual time relation

Usual when improving my queries I see a coinciding improvement with both cost and actual time when running an explain analyze on both before and after queries.

However, in one case, the before query reports

"Hash Join  (cost=134.06..1333.57 rows=231 width=70) 
            (actual time=115.349..115.650 rows=231 loops=1)"
<cut...>
"Planning time: 4.060 ms"
"Execution time: 115.787 ms"

and the after reports

"Hash Join  (cost=4.63..1202.61 rows=77 width=70) 
            (actual time=0.249..0.481 rows=231 loops=1)"
<cut...>
"Planning time: 2.079 ms"
"Execution time: 0.556 ms"

So as you can see, the costs are similar but actual and real execution times are vastly different, regardless of the order in which I run the tests.

Using Postgres 8.4.

Can anyone clear up my understanding as to why the cost does not show an improvement?

Upvotes: 4

Views: 8907

Answers (1)

Dhwani Katagade
Dhwani Katagade

Reputation: 1240

There isn't much information available in the details given in the question but a few pointers can may be help others who come here searching on the topic.

  • The cost is a numerical estimate based on table statistics that are calculated when analyze is run on the tables that are involved in the query. If the table has never been analyzed then the plan and the cost may be way sub optimal. The query plan is affected by the table statistics.
  • The actual time is the actual time taken to run the query. Again this may not correlate properly to the cost depending on how fresh the table statistics are. The plan may be arrived upon depending on the current table statistics, but the actual execution may find real data conditions different from what the table statistics tell, resulting in a skewed execution time.

Point to note here is that, table statistics affect the plan and the cost estimate, where as the plan and actual data conditions affect the actual time. So, as a best practice, before working on query optimization, always run analyze on the tables.

A few notes:

  • analyze <table> - updates the statistics of the table.
  • vacuum analyze <table> - removes stale versions of the updated records from the table and then updates the statistics of the table.
  • explain <query> - only generates a plan for the query using statistics of the tables involved in the query.
  • explain (analyze) <query> - generates a plan for the query using existing statistics of the tables involved in the query, and also runs the query collecting actual run time data. Since the query is actually run, if the query is a DML query, then care should be taken to enclose it in begin and rollback if the changes are not intended to be persisted.

Upvotes: 9

Related Questions