Reputation: 711
In Google BigQuery, I have a query that has the same field name appearing multiple times in various join subqueries. I would like to abstract out this field name into a temporary function such that it will amend it in all places if I change it within the function only.
This is the query I have:
SELECT *
FROM
(SELECT field1, COUNT(*) sq1_total
FROM table
WHERE condition = 1
GROUP BY field 1) sq1
LEFT JOIN
(SELECT field1, COUNT(*) sq2_total
FROM table
WHERE condition = 0
GROUP BY field 1) sq2
USING(field1)
This is what I would like to have:
CREATE TEMP FUNCTION replace_field_name() AS (...);
SELECT *
FROM
(SELECT replace_field_name(), COUNT(*) sq1_total
FROM table
WHERE condition = 1
GROUP BY replace_field_name()) sq1
LEFT JOIN
(SELECT replace_field_name(), COUNT(*) sq2_total
FROM table
WHERE condition = 0
GROUP BY replace_field_name()) sq2
USING(replace_field_name())
So that when I want to compare many different fields like this, I only need to change the field name in one place as opposed to five places.
Is this possible?
Upvotes: 1
Views: 656
Reputation: 173046
Below thoughts/proposals relevant in terms of BigQuery Standard SQL
I would like to abstract out this field name into a temporary function ...
As Tim mentioned in his comment - it is quite not possible to do in a way you mock it
I want to compare many different fields like this, I only need to change the field name in one place as opposed to five places.
You can try to re-write your query in such a way that you will need to change field name in less places, like in below examples
#standardSQL
SELECT * FROM (SELECT field1, COUNT(*) sq1_total FROM `project.dataset.table` WHERE condition = 1 GROUP BY 1) sq1
LEFT JOIN (SELECT field1, COUNT(*) sq2_total FROM `project.dataset.table` WHERE condition = 0 GROUP BY 1) sq2
USING (field1)
OR
#standardSQL
SELECT DISTINCT field1,
COUNTIF(condition = 1) OVER(PARTITION BY field1) sq1_total,
COUNTIF(condition = 0) OVER(PARTITION BY field1) sq2_total
FROM `project.dataset.table`
In bothe above queries - there are "just" three place to replace field name in (as opposed to 5 in original query)
Obviously - this does not address the problem in qualitative way - just quantitatively
Is this possible?
Good news - there is always work around - but usually it requires to slightly change something in your requirements, expectations
For example in below solution you need to set field name only once!!! in UNNEST(['field1']) field
line
#standardSQL
SELECT DISTINCT field, value,
COUNTIF(condition = 1) OVER(PARTITION BY field, value) sq1_total,
COUNTIF(condition = 0) OVER(PARTITION BY field, value) sq2_total
FROM (
SELECT field, REGEXP_EXTRACT(x, CONCAT(r'"', field, '":"?([^",])"?')) value, condition
FROM `project.dataset.table` t,
UNNEST([TO_JSON_STRING(t)]) x,
UNNEST(['field1']) field
)
the "price" is - you will have output in form of (with dummy data)
Row field value sq1_total sq2_total
1 field1 1 1 3
2 field1 2 1 0
instead of output from original query
Row field1 sq1_total sq2_total
1 1 1 3
2 2 1 null
I want to compare many different fields like this ...
The extra value in above approach is that you can run your comparison (for as many fields as you want) in one shot - by adding needed fields' names into UNNEST(['field1']) field
list as in below example
#standardSQL
SELECT DISTINCT field, value,
COUNTIF(condition = 1) OVER(PARTITION BY field, value) sq1_total,
COUNTIF(condition = 0) OVER(PARTITION BY field, value) sq2_total
FROM (
SELECT field, REGEXP_EXTRACT(x, CONCAT(r'"', field, '":"?([^",])"?')) value, condition
FROM `project.dataset.table` t,
UNNEST([TO_JSON_STRING(t)]) x,
UNNEST(['field1', 'field2']) field
)
-- ORDER BY field, value
so result could look like
Row field value sq1_total sq2_total
1 field1 1 1 3
2 field1 2 1 0
3 field2 1 1 1
4 field2 2 0 2
5 field2 3 1 0
Upvotes: 2