Mich Talebzadeh
Mich Talebzadeh

Reputation: 125

Measuring the averaged elapsed time for SQL code running in google BigQuery

As BigQuery is a shared resource, it is possible that one gets different values running the same code on BigQuery. OK one option that I always use is to turn off caching in Query Settings, Cache preference. This way queries will not be cached. The problem with this setting is that if you refresh the browser or leave it idle, that Cache Preference box will be ticked again.

Anyhow I had a discussion with some developers that are optimizing the code. In a nutshell, they take slow running code, run it 5 times and get an average, then following optimization then run the code again 5 times to get an average value for optimized SQL. Details are not clear to me. However, my preference would be (all in BQ console)

  1. create a user session
  2. turn off sql caching
  3. On BQ console paste the slow running code;
  4. On the same session paste the optimized code
  5. Run the codes (separated by ";")
  6. This will ensure that any systematics like BQ busy/overloaded, slow connection etc will affect "BOTH" SQL piece equally and the systematics will be cancelled out. In my option one only need to run it once as caching is turned off as well. Running 5 times to get an average looks excessive and superfluous?

Appreciate any suggestions/feedback

Thanks

Upvotes: 0

Views: 181

Answers (1)

Pentium10
Pentium10

Reputation: 207962

Measuring the time is one way, the other way to see if the query has been optimized is the understanding of the query plan and how slots are used effectively.

I've been with BigQuery more than 6 years, and what you describe was never used by me. In BigQuery actually what matters is reducing the costs, and that can be done iteratively rewriting the query, and using partitioning/clustering/materialized views, caching/temporary tables.

Upvotes: 1

Related Questions