BigQuery - Populating SELECT fields from results of a temp function

Question

In Google BigQuery, I have a query that has the same field name appearing multiple times in various join subqueries. I would like to abstract out this field name into a temporary function such that it will amend it in all places if I change it within the function only.

This is the query I have:

SELECT *
FROM

(SELECT field1, COUNT(*) sq1_total
FROM table
WHERE condition = 1
GROUP BY field 1) sq1

LEFT JOIN

(SELECT field1, COUNT(*) sq2_total
FROM table
WHERE condition = 0
GROUP BY field 1) sq2

USING(field1)

This is what I would like to have:

CREATE TEMP FUNCTION replace_field_name() AS (...);

SELECT *
FROM

(SELECT replace_field_name(), COUNT(*) sq1_total
FROM table
WHERE condition = 1
GROUP BY replace_field_name()) sq1

LEFT JOIN

(SELECT replace_field_name(), COUNT(*) sq2_total
FROM table
WHERE condition = 0
GROUP BY replace_field_name()) sq2

USING(replace_field_name())

So that when I want to compare many different fields like this, I only need to change the field name in one place as opposed to five places.

Is this possible?

Mikhail Berlyant · Accepted Answer

Below thoughts/proposals relevant in terms of BigQuery Standard SQL

I would like to abstract out this field name into a temporary function ...

As Tim mentioned in his comment - it is quite not possible to do in a way you mock it

I want to compare many different fields like this, I only need to change the field name in one place as opposed to five places.

You can try to re-write your query in such a way that you will need to change field name in less places, like in below examples

#standardSQL
SELECT * FROM (SELECT field1, COUNT(*) sq1_total FROM `project.dataset.table` WHERE condition = 1 GROUP BY 1) sq1
    LEFT JOIN (SELECT field1, COUNT(*) sq2_total FROM `project.dataset.table` WHERE condition = 0 GROUP BY 1) sq2
USING (field1)

OR

#standardSQL
SELECT DISTINCT field1, 
  COUNTIF(condition = 1) OVER(PARTITION BY field1) sq1_total, 
  COUNTIF(condition = 0) OVER(PARTITION BY field1) sq2_total
FROM `project.dataset.table`

In bothe above queries - there are "just" three place to replace field name in (as opposed to 5 in original query)

Obviously - this does not address the problem in qualitative way - just quantitatively

Is this possible?

Good news - there is always work around - but usually it requires to slightly change something in your requirements, expectations

For example in below solution you need to set field name only once!!! in UNNEST(['field1']) field line

#standardSQL
SELECT DISTINCT field, value,
  COUNTIF(condition = 1) OVER(PARTITION BY field, value) sq1_total, 
  COUNTIF(condition = 0) OVER(PARTITION BY field, value) sq2_total
FROM (
  SELECT field, REGEXP_EXTRACT(x, CONCAT(r'"', field, '":"?([^",])"?')) value, condition
  FROM `project.dataset.table` t,
  UNNEST([TO_JSON_STRING(t)]) x,
  UNNEST(['field1']) field
)

the "price" is - you will have output in form of (with dummy data)

Row field   value   sq1_total   sq2_total    
1   field1  1       1           3    
2   field1  2       1           0

instead of output from original query

Row field1  sq1_total   sq2_total    
1   1       1           3    
2   2       1           null

I want to compare many different fields like this ...

The extra value in above approach is that you can run your comparison (for as many fields as you want) in one shot - by adding needed fields' names into UNNEST(['field1']) field list as in below example

#standardSQL
SELECT DISTINCT field, value,
  COUNTIF(condition = 1) OVER(PARTITION BY field, value) sq1_total, 
  COUNTIF(condition = 0) OVER(PARTITION BY field, value) sq2_total
FROM (
  SELECT  field, REGEXP_EXTRACT(x, CONCAT(r'"', field, '":"?([^",])"?')) value, condition
  FROM `project.dataset.table` t,
  UNNEST([TO_JSON_STRING(t)]) x,
  UNNEST(['field1', 'field2']) field
)
-- ORDER BY field, value

so result could look like

Row field   value   sq1_total   sq2_total    
1   field1  1       1           3    
2   field1  2       1           0    
3   field2  1       1           1    
4   field2  2       0           2    
5   field2  3       1           0

BigQuery - Populating SELECT fields from results of a temp function

Answers (1)

Related Questions