Reputation: 463
I recently had to generate a BQ table out of other BQ tables. The logic was rather involved and I ended up writing a complex SQL statement.
In Oracle SQL I would have written a PL/SQL procedure with the logic broken down into separate pieces (most often merge statements). In some cases I would encapsulate some code into functions. The resulting procedure would be a sequence of DML statements, easy to read and maintain.
However nothing similar exists for BQ. The UDF's are only temporary and cannot be stored within -say- a view.
Question: I am looking for ways to make my complex BQ SQL code more modular and readable. Is there any way I could accomplish this?
Upvotes: 2
Views: 1435
Reputation: 173190
currently available option is to use WITH Clause
The WITH clause contains one or more named subqueries whose output acts as a temporary table which subsequent SELECT statements can reference in any clause or subquer
I would still consider User-Defined Functions
as a really good option.
JS and SQL UDF are available in BigQuery and from what is known BigQuery team is working on introducing permanent UDF
to be available soon
Meantime you can just store body of JS UDF as a js library and reference it in your UDF using OPTIONS section. see Including external libraries
in above reference
October 2019 Update
The ability to use scripting and stored procedures is now in Beta.
So, you can send multiple statements to BigQuery in one request, to use variables, and to use control flow statements such as IF and WHILE, etc.
And, you can use procedure, which is a block of statements that can be called from other queries.
Note: it is Beta yet
Upvotes: 4
Reputation: 33765
BigQuery supports persistent user-defined functions. To get started, see the documentation.
For example, here's a CREATE FUNCTION
statement that creates a function to compute the median of an array:
CREATE FUNCTION dataset.median(arr ANY TYPE) AS (
(
SELECT
IF(
MOD(ARRAY_LENGTH(arr), 2) = 0,
(arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
)
FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
)
);
After executing this statement, you can reference it in a follow-up query:
SELECT dataset.median([7, 1, 2, 10]) AS median;
You can also reference the function inside logical views. Note that you currently need to qualify the reference to the function inside the view using a project, however:
CREATE VIEW dataset.sampleview AS
SELECT x, `project-name`.dataset.median(array_column) AS median
FROM `project-name`.dataset.table
Upvotes: 1