Reputation: 1433
How can we construct an array of struct, in which each row in the array corresponds to several array columns having the same row number?
I will show you an example of what I meant to better illustrate my problem.
Suppose I want to query a table named table_1 that looks like this:
So here, col_2, col_3, col_4, and col_5 are all array columns in BigQuery. I know that col_2, col_3, and col_4 in table_1 always have the same number of rows. And I need to concatenate them into an array of structs, in which the combination of each row in the array columns will form one struct row in the new array. So the resulting array will be (x, 1, s), (y, 2, t), and (z, 3, u)
Suppose this concatenated column is called comb_col. This comb_col is essentially an array of structs. Each row is a struct composed of col_2, col_3, and col_4 as in the table below:
Note that the value of comb_col should be (x, 1, s), (y, 2, t), and (z, 3, u), and not ([x, y, z], [1, 2, 3], [s, t, u])
What's the simplest way to do this? I tried using UNNEST with cross joins, but this will make CROSS JOIN, not CONCATENATION (so in the example above, 9 rows will be produced instead of 3 rows of combined columns).
I thought of using row number as well with ROW_NUMBER(), UNNEST with cross joins, then filter only those rows having the same row number as follows. My solution achieves what I wanted, however, the solution is a bit wordy and complicated. Is there a better and simpler way to achieve my objective?
CREATE TEMP FUNCTION ADD_ROW_NUMBER(arr ANY TYPE) AS (
ARRAY(
SELECT AS STRUCT
data,
row_number() OVER () AS rn
FROM UNNEST(arr) data
)
);
SELECT
id,
col_1,
ARRAY(
(SELECT AS STRUCT
c2.data AS col_1,
c3.data AS col_2,
c4.data AS col_3
FROM
UNNEST(ADD_ROW_NUMBER(col_1)) c1,
UNNEST(ADD_ROW_NUMBER(col_2)) c2,
UNNEST(ADD_ROW_NUMBER(col_3)) c3
WHERE c1.rn = c2.rn AND c2.rn = c3.rn)
) AS comb_col,
col_5,
ARRAY(
SELECT
col_5 * 10
) AS col_5_times_10
FROM table_1
Thank you.
Upvotes: 1
Views: 3399
Reputation: 172954
Is there a better and simpler way to achieve my objective?
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, col_1,
ARRAY(SELECT AS STRUCT col_2, col_3, col_4
FROM UNNEST(col_2) col_2 WITH OFFSET
JOIN UNNEST(col_3) col_3 WITH OFFSET USING(OFFSET)
JOIN UNNEST(col_4) col_4 WITH OFFSET USING(OFFSET)
) comb_col,
col_5
FROM `project.dataset.table`
It might be a matter of opinion - but it looks less verbose to me
Obviously, if to apply to sample data from you question - result is as expected
Row id col_1 comb_col.col_2 comb_col.col_3 comb_col.col_4 col_5
1 1 a x 1 s 4
y 2 t 5
z 3 u
Upvotes: 3