BigQuery: flatten multiple repeated columns of different length

Question

I have followed the answers given to this question: BigQuery: flatten two repeated columns but it doesn't quite work although it's the closest to what I'm looking for.

I have data being sent from Google Analytics to Google BigQuery from an app. I have 10 repeated columns:

event_params [RECORD REPEATED]
user_properties [RECORD REPEATED]
user_ltv [RECORD NULLABLE]
device [RECORD NULLABLE]
geo [RECORD NULLABLE]
app_info [RECORD NULLABLE]
traffic_source [RECORD NULLABLE]
event_dimensions [RECORD NULLABLE]
ecommerce [RECORD NULLABLE]
items [RECORD REPEATED]

Whenever there is a new event, there will be an:

event_date
event_timestamp
event_name

for each row. These repeated properties can have different lengths for each event and there is a correspondence on the index of the event.

Below is a snapshot of just the first two repeated columns event_params and user_properties along with what I would want to generate with these two columns and the others if needed:

Here we see that event_params has length 7 and user_properties has length 4. When I run the following code:

-- standardSQL
SELECT
    event_name, event_params,
    user_properties[OFFSET(off)] AS user_properties
FROM
    `yepic-2021.analytics_264796885.events_intraday_*`,
    UNNEST(event_params) AS event_params WITH OFFSET off
ORDER BY
    event_timestamp DESC
LIMIT 50

but this results in the error:

Array index 4 is out of bounds (overflow)

This makes sense because they are not the same length. So my thought was that if anyone knows how to add null to all the other columns until their length is equal to which ever column is has the longest length then this would produce the fully flattened output that I want.

This is an example of not what I want where there's an explosion of duplicates by flattening on the already flattened table:

-- standardSQL
SELECT
    event_name, event_params, user_properties
FROM
    `yepic-2021.analytics_264796885.events_intraday_*`,
    UNNEST(event_params) AS event_params,
    UNNEST(user_properties) AS user_properties
ORDER BY
    event_timestamp DESC
LIMIT 50

Results:

If anyone could help with this approach or knows BigQuery better than myself and a simple method to flatten the data from GA then I would really appreciate your help.

TEMP EDITS:

Here is the code that I have tried in BigQuery:

-- standardSQL
WITH data1 AS (
    SELECT GENERATE_UUID() AS row_id, event_params, user_properties
    FROM `yepic-2021.analytics_264796885.events_intraday_*`
),
data2 AS (
    SELECT *, GENERATE_ARRAY(1, GREATEST(ARRAY_LENGTH(event_params), ARRAY_LENGTH(user_properties))) ordinals
    FROM data1
)
SELECT row_id, event_params[SAFE_ORDINAL(o)] event_params, user_properties[SAFE_ORDINAL(o)] user_properties
FROM data2, UNNEST(ordinals) o

Results:

BigQuery: flatten multiple repeated columns of different length

TEMP EDITS:

Answers (1)

Related Questions