Big Query User Defined Function dramatically slows down the query

Question

I have a long query written in standard SQL dialect where I join multiple tables together. I had performance issue and my query was taking at least 9.5 seconds to run. I turned off caching and began tweaking the query to make it faster, but only saw marginal progress.

I had a User Defined Function (UDF) defined at the top of the query to handle division by 0 as follows:

CREATE TEMPORARY FUNCTION divide(numerator FLOAT64, denominator FLOAT64)
RETURNS STRING
LANGUAGE js AS """
  return (denominator === 0) ? "Not Available" : (100 * (numerator/denominator)) + "%";
""";

I noticed that I was not calling the UDF anywhere in the query anymore, so I removed it. To my surprise the query ran much faster after the removal. I ran it a bunch of times and it never took more than 4.5 seconds.

Why does just creating a UDF add so much to the run time of the query?

Elliott Brossard · Accepted Answer

When a user-defined JavaScript function is present in the query text, BigQuery initializes a JavaScript environment with the function's contents on every shard of execution. There is (at the time of this writing) no optimization to avoid loading the environment if the function is not referenced, since the expectation is that if there is a JavaScript UDF present, the intent is probably to use it. The discrepancy that you are seeing is due to the start-up time of the JavaScript environment.

With SQL UDFs, however, the story is different. While BigQuery still has to parse the SQL UDFs regardless of whether you use them in order to figure out where the actual query starts, there is minimal overhead associated with that.

Big Query User Defined Function dramatically slows down the query

Answers (2)

Related Questions