Reputation: 4050
I just started using BigQuery to explore my company's GA session data. I'm trying to generate a query that will allow me to generate the counts for each experimentId
that has been passed to GA.
The experimentId
is stored in hits.experiment.experimentId
as a nested field. So far I have the following query, but it seems inefficient to have to do the left join to get this info. How can I optimize this query for variables that are nested in another nested array?
SELECT
e.experimentId,
count(*)
FROM
`project-name.IDNUM.ga_sessions_20170527`,
UNNEST(hits) AS hits
LEFT JOIN UNNEST(hits.experiment) as e
GROUP BY e.experimentId
Upvotes: 4
Views: 15306
Reputation: 11777
You can use the CROSS JOIN
operation again using the unnested hits
field:
SELECT
e.experimentId,
count(*)
FROM
`project-name.IDNUM.ga_sessions_20170527`,
UNNEST(hits) AS hits,
UNNEST(hits.experiment) as e
GROUP BY e.experimentId
You can also find some examples in BQ docs on how to process repeated fields (arrays) and a bunch of different techniques that will help you on several types of analyzes you want to run in BQ.
Upvotes: 8