Reputation: 207982
The data that comes out from the BigQuery implementation of GoogleAnalytics raw data looks like this:
|-visitId
|- date
|- (....)
+- hits
|- time
|- page
|- pagePath
|- eventInfo
|- eventAction
+- customDimensions
|- index
|- value
I am looking to grab 3 values out from the repeated customDimensions
like
+---------+---------+-------+-----------+---------------+
| user_id | country | split | page Hits | CTA event hit |
+---------+---------+-------+-----------+---------------+
| 100 | US | A | 25000 | 500 |
+---------+---------+-------+-----------+---------------+
| 100 | US | B | 8000 | 90 |
+---------+---------+-------+-----------+---------------+
| 200 | ES | A | 400 | 2 |
+---------+---------+-------+-----------+---------------+
the first three columns are defined by hits.customDimensions.index
1,4,7.
The page hit
SUM is how many views they did, the CTA event hit
is sum of the event that is fired if they click a button on a page itself. For the sake of SQL simplicity we can name hits.page.pagePath='tshirt'
and hits.eventInfo.eventAction='upsell'
I am having difficulty reading the 3 customDimensions out from the same repated field, then I am having difficulty locating the events that happened in the same session.
Update for those that are not familiar with BQ dataset
In the below image each line is a hit, and multiple hits can be on the same row. In BigQuery that is called REPEATED field. In the image you see 3 taller rows. The first row has 8 hit. The image doesn't contain multiple customDimensions, but that can be multiple for the same hit. To access a sample DB set on BigQuery read here it's free.
Upvotes: 2
Views: 469
Reputation: 11787
Before answering, I'd like to show the mock data that I used as a guide to come up with a solution, hopefully it will be useful:
WITH mock_data AS(
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '000' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(1 as index, '000' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '100' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(4 as hitnumber, [STRUCT(1 as index, '100' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all
select '3' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '3' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '3' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '300' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '3' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(1 as index, '300' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits
)
I've simulated 4 distinct users visiting the website using the same schema we find in BigQuery ga_sessions
table.
Some of my assumptions might be a bit different from your actual data. If this is the case, please let me know and we can adapt the mock data as a guide for more accurate answers (I actually use these mocks to run integration tests in our production environment so they can be helpful).
The assumptions I did was (correct me if I'm wrong):
customDimensions
are only fired when hits.page.pagepath=tshirt
eventAction
click happens, customDimensions
are not fired at the same time (i.e, the events are fired in one hitNumber and the customized events into another).This might give the result expected:
select
user_id,
country,
_split,
sum(page_hits) page_hits,
sum(CTA_event_hit) CTA_event_hit
from(
select
max(user_id) user_id,
max(country) country,
max(_split) _split,
max(page_hits) page_hits,
max(CTA_event_hit) CTA_event_hit
from(
select
fv,
v,
user_id,
country,
_split,
count(case when user_id is not null then 1 end) page_hits,
sum(click_flag) CTA_event_hit
from(
select
fullvisitorid fv,
visitid v,
(select custd.value from unnest(hits.customdimensions) custd where custd.index = 1) user_id,
(select custd.value from unnest(hits.customdimensions) custd where custd.index = 4) country,
(select custd.value from unnest(hits.customdimensions) custd where custd.index = 7) _split,
case when hits.eventinfo.eventcategory = 'specific_category' and hits.eventinfo.eventlabel = 'specific_label' and hits.eventinfo.eventaction = 'upsell' then 1 end click_flag
from mock_data,
unnest(hits) hits
where 1 = 1
and hits.page.pagepath = 'tshirt'
)
group by fv, v, user_id, country, _split
)
group by fv, v
having user_id is not null
)
group by user_id, country, _split
That results in:
Basically, the are some subselect
queries to retrieve user_id, country and split. For every session (visitid) the data is aggregated by using the MAX
operator and finally there's a final aggregation on the user_id, country and split level.
In other to query on your dataset you just need to change mock_data
to the correspondent ga_session
desired table.
Not sure if this will solve your problem but might be of some help.
As a final note, it seems like this data is the setup for maybe an AB test or some analyzes of performances for different variations of your site. This being the case, I'd recommend not allowing users to change their split value as this can lead to some poisoning of the data (which might skew results).
Upvotes: 2
Reputation: 14014
To make sure I understood the problem, I only provide here solution for computing custom columns and page hits metric, but not (yet) for CTA event hit. Using sample GA table and Standard SQL, it can look something like this:
SELECT
ARRAY(SELECT AS STRUCT c.product, c.color, 1 page_hits
FROM t.hits hit CROSS JOIN
UNNEST(ARRAY(
SELECT DISTINCT AS STRUCT
if(dim.index = 1, dim.value, NULL) product,
if(dim.index = 2, dim.value, NULL) color
FROM hit.customDimensions dim
WHERE dim.index in (1,2))) c
)
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t
Basically, in the inner SELECT we convert customDimensions.index into separate columns (product and color in this example), and then the outer SELECT prepares to count them by setting page_hits to 1 for every hit.
Upvotes: 1