Reputation: 5510
The following is a query to flatten "hit" results piped to BigQuery from Google Analytics:
SELECT
*
FROM flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten([ga_sessions_20171116], hits), hits.product), hits.product.customDimensions), hits.product.customMetrics), hits.promotion), hits.experiment), hits.customDimensions), hits.customVariables), hits.customMetrics)
Limit 20
How would one do the same across a range of tables, or is that even possible? I tried:
SELECT
*
FROM flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten([ga_sessions_2017111*], hits), hits.product), hits.product.customDimensions), hits.product.customMetrics), hits.promotion), hits.experiment), hits.customDimensions), hits.customVariables), hits.customMetrics)
WHERE _TABLE_SUFFIX BETWEEN '0' and '10'
Limit 20
But it didn't work. Does anyone know how one would do this?
Upvotes: 1
Views: 1610
Reputation: 11787
Working with nested data in Standard SQL is much easier than it is in Legacy (because of query syntax and predictability of behavior).
This being said, do consider using it. Your query in Standard becomes something like:
SELECT
fullvisitorid visitor_id,
prods.productSku sku,
custd.index index
FROM `project_id.dataset_id.ga_sessions_*`,
UNNEST(hits) hits,
UNNEST(hits.product) prods,
UNNEST(prods.customDimensions) custd
WHERE _TABLE_SUFFIX BETWEEN '20171110' and '20171111'
LIMIT 1000
This is just an example but hopefully it's enough to understand the concept.
hits
is a repeated field of structs so it's something like:
hits = [{'hitNumber': 1, 'product': [{'productSku': 'sku0'}]}, {'hitNumber': 2}, ...]
When you apply the unnest(hits) AS unnested_hits
operation it becomes:
unnested_hits = {'hitNumber': 1, 'product': [{'productSku': 'sku0'}]},
{'hitNumber': 2}
...
So if you call it "unnested_hits"
, when you reference to this alias you'll get this flattened data. You can go on such as unnesting the field product
inside unnested_hits
).
For a deeper understanding of these concepts, make sure to read through the docs, they are quite well written and you can learn pretty much everything you'll need to work effectively in BigQuery.
As a final note, you are selecting all fields from GA. As the old saying goes, every time someone runs a query of the type "SELECT * FROM TABLE"
a panda somewhere in the world dies.
You must be very careful to run this type of query in BQ as you'll get billed by volume of processed data; make sure you are bringing only what is absolutely necessary.
Upvotes: 3