denim
denim

Reputation: 463

Encapsulating complex code in BigQuery

I recently had to generate a BQ table out of other BQ tables. The logic was rather involved and I ended up writing a complex SQL statement.

In Oracle SQL I would have written a PL/SQL procedure with the logic broken down into separate pieces (most often merge statements). In some cases I would encapsulate some code into functions. The resulting procedure would be a sequence of DML statements, easy to read and maintain.

However nothing similar exists for BQ. The UDF's are only temporary and cannot be stored within -say- a view.

Question: I am looking for ways to make my complex BQ SQL code more modular and readable. Is there any way I could accomplish this?

Upvotes: 2

Views: 1435

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

currently available option is to use WITH Clause

The WITH clause contains one or more named subqueries whose output acts as a temporary table which subsequent SELECT statements can reference in any clause or subquer

I would still consider User-Defined Functions as a really good option.
JS and SQL UDF are available in BigQuery and from what is known BigQuery team is working on introducing permanent UDF to be available soon

Meantime you can just store body of JS UDF as a js library and reference it in your UDF using OPTIONS section. see Including external libraries in above reference

October 2019 Update

The ability to use scripting and stored procedures is now in Beta.

So, you can send multiple statements to BigQuery in one request, to use variables, and to use control flow statements such as IF and WHILE, etc.
And, you can use procedure, which is a block of statements that can be called from other queries.

Note: it is Beta yet

Upvotes: 4

Elliott Brossard
Elliott Brossard

Reputation: 33765

BigQuery supports persistent user-defined functions. To get started, see the documentation.

For example, here's a CREATE FUNCTION statement that creates a function to compute the median of an array:

CREATE FUNCTION dataset.median(arr ANY TYPE) AS (
  (
    SELECT
      IF(
        MOD(ARRAY_LENGTH(arr), 2) = 0,
        (arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
        arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
      )
    FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
  )
);

After executing this statement, you can reference it in a follow-up query:

SELECT dataset.median([7, 1, 2, 10]) AS median;

You can also reference the function inside logical views. Note that you currently need to qualify the reference to the function inside the view using a project, however:

CREATE VIEW dataset.sampleview AS
SELECT x, `project-name`.dataset.median(array_column) AS median
FROM `project-name`.dataset.table

Upvotes: 1

Related Questions