dekauliya
dekauliya

Reputation: 1433

How do I combine array columns into array of struct in BigQuery?

How can we construct an array of struct, in which each row in the array corresponds to several array columns having the same row number?

I will show you an example of what I meant to better illustrate my problem.

Suppose I want to query a table named table_1 that looks like this: Example table

So here, col_2, col_3, col_4, and col_5 are all array columns in BigQuery. I know that col_2, col_3, and col_4 in table_1 always have the same number of rows. And I need to concatenate them into an array of structs, in which the combination of each row in the array columns will form one struct row in the new array. So the resulting array will be (x, 1, s), (y, 2, t), and (z, 3, u)

Suppose this concatenated column is called comb_col. This comb_col is essentially an array of structs. Each row is a struct composed of col_2, col_3, and col_4 as in the table below: Example table2

Note that the value of comb_col should be (x, 1, s), (y, 2, t), and (z, 3, u), and not ([x, y, z], [1, 2, 3], [s, t, u])

What's the simplest way to do this? I tried using UNNEST with cross joins, but this will make CROSS JOIN, not CONCATENATION (so in the example above, 9 rows will be produced instead of 3 rows of combined columns).

I thought of using row number as well with ROW_NUMBER(), UNNEST with cross joins, then filter only those rows having the same row number as follows. My solution achieves what I wanted, however, the solution is a bit wordy and complicated. Is there a better and simpler way to achieve my objective?

CREATE TEMP FUNCTION ADD_ROW_NUMBER(arr ANY TYPE) AS (
  ARRAY(
    SELECT AS STRUCT
      data, 
      row_number() OVER () AS rn
    FROM UNNEST(arr) data
  )
); 

SELECT
  id, 
  col_1,
  ARRAY(
    (SELECT AS STRUCT
        c2.data AS col_1, 
        c3.data AS col_2,
        c4.data AS col_3
      FROM 
        UNNEST(ADD_ROW_NUMBER(col_1)) c1, 
        UNNEST(ADD_ROW_NUMBER(col_2)) c2, 
        UNNEST(ADD_ROW_NUMBER(col_3)) c3 
      WHERE c1.rn = c2.rn AND c2.rn = c3.rn)
  ) AS comb_col,
  col_5,
  ARRAY(
    SELECT 
      col_5 * 10
  ) AS col_5_times_10
FROM table_1

Thank you.

Upvotes: 1

Views: 3399

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172954

Is there a better and simpler way to achieve my objective?

Below is for BigQuery Standard SQL

#standardSQL
SELECT id, col_1,
  ARRAY(SELECT AS STRUCT col_2, col_3, col_4 
    FROM UNNEST(col_2) col_2 WITH OFFSET 
    JOIN UNNEST(col_3) col_3 WITH OFFSET USING(OFFSET)
    JOIN UNNEST(col_4) col_4 WITH OFFSET USING(OFFSET)
  ) comb_col,
  col_5
FROM `project.dataset.table`   

It might be a matter of opinion - but it looks less verbose to me

Obviously, if to apply to sample data from you question - result is as expected

Row id  col_1   comb_col.col_2  comb_col.col_3  comb_col.col_4  col_5    
1   1   a       x               1               s               4    
                y               2               t               5    
                z               3               u        

Upvotes: 3

Related Questions