fraxture
fraxture

Reputation: 5510

Getting flattened hit data from Google Analytics in Big Query over a range of tables

The following is a query to flatten "hit" results piped to BigQuery from Google Analytics:

SELECT
*
FROM flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten([ga_sessions_20171116], hits), hits.product), hits.product.customDimensions), hits.product.customMetrics), hits.promotion), hits.experiment), hits.customDimensions), hits.customVariables), hits.customMetrics)
Limit 20

How would one do the same across a range of tables, or is that even possible? I tried:

SELECT
*
FROM flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten(flatten([ga_sessions_2017111*], hits), hits.product), hits.product.customDimensions), hits.product.customMetrics), hits.promotion), hits.experiment), hits.customDimensions), hits.customVariables), hits.customMetrics)
WHERE _TABLE_SUFFIX BETWEEN '0' and '10'
Limit 20

But it didn't work. Does anyone know how one would do this?

Upvotes: 1

Views: 1610

Answers (1)

Willian Fuks
Willian Fuks

Reputation: 11787

Working with nested data in Standard SQL is much easier than it is in Legacy (because of query syntax and predictability of behavior).

This being said, do consider using it. Your query in Standard becomes something like:

SELECT
  fullvisitorid visitor_id,
  prods.productSku sku,
  custd.index index
FROM `project_id.dataset_id.ga_sessions_*`,
UNNEST(hits) hits,
UNNEST(hits.product) prods,
UNNEST(prods.customDimensions) custd
WHERE _TABLE_SUFFIX BETWEEN '20171110' and '20171111'
LIMIT 1000

This is just an example but hopefully it's enough to understand the concept.

hits is a repeated field of structs so it's something like:

hits = [{'hitNumber': 1, 'product': [{'productSku': 'sku0'}]}, {'hitNumber': 2}, ...]

When you apply the unnest(hits) AS unnested_hits operation it becomes:

unnested_hits = {'hitNumber': 1, 'product': [{'productSku': 'sku0'}]},
                {'hitNumber': 2}
                ...

So if you call it "unnested_hits", when you reference to this alias you'll get this flattened data. You can go on such as unnesting the field product inside unnested_hits).

For a deeper understanding of these concepts, make sure to read through the docs, they are quite well written and you can learn pretty much everything you'll need to work effectively in BigQuery.

As a final note, you are selecting all fields from GA. As the old saying goes, every time someone runs a query of the type "SELECT * FROM TABLE" a panda somewhere in the world dies.

You must be very careful to run this type of query in BQ as you'll get billed by volume of processed data; make sure you are bringing only what is absolutely necessary.

Upvotes: 3

Related Questions