Reputation: 3
Using Google Analytics data in Google BigQuery, I am trying to recreate a GA segment and pull basic metrics like sessions. The segment definition is: custom dimension A = 1 or 5 or custom dimension B had values or Page = pageA or Page = pageB
The following code works well with the two custom dimension conditions. Once I added the page portion, the result was always all sessions for the day.
SELECT
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))) AS Sessions
FROM
`123456789.ga_sessions_20200201` as Data,
unnest(hits) as hits
WHERE totals.visits = 1
-- custom dimension A
and (
(SELECT
value
FROM
UNNEST(hits.customDimensions)
WHERE
index = 9
GROUP BY
1) is not null
-- custom dimension B
or (SELECT
value
FROM
UNNEST(hits.customDimensions)
WHERE
index = 10
GROUP BY
1) in ('1','5')
-- Page
or Exists(Select
hits.page.pagePath AS Page
FROM
`123456789.ga_sessions_20200201` ,
unnest(hits) as hits
Where totals.visits = 1
AND hits.type = 'PAGE'
and hits.page.pagePath in ('pageA','pageB'))
)
Upvotes: 0
Views: 1609
Reputation: 2365
Your mistake is, you're using ga_sessions table in the third condition. Therefore, it scans the all table and exists condition returns True. So, it becomes true for all the rows.
Also, you don't have to join unnest(hits). It creates multiple rows for each session. If you handle it without joining nested hits, you'll have a single row for each session. So, it'll be easier to count them.
I updated it also, simplified the query, so I think this will give you what you want in your dataset.
SELECT
COUNT(*) AS Sessions
FROM
`123456789.ga_sessions_20200201` as Data
WHERE totals.visits = 1
and exists (
SELECT
1
FROM
UNNEST(hits) as hit
WHERE
EXISTS (select 1 from unnest(hit.customDimensions) where index = 9 and value is not null) -- custom dimension A
OR EXISTS (select 1 from unnest(hit.customDimensions) where index = 10 and value in ('1', '5')) -- custom dimension B
OR (hit.type = 'PAGE' and hit.page.PagePath in ('pageA', 'pageB')) -- Page
)
Upvotes: 1