Reputation: 3369
I am working with Google Analytics data in BigQuery.
I'd like to show a list of transaction ids from users who visited a particular page on a website in their session, I've unnested hits.page.pagepath
in order to identify a particular page, but since I don't know which row the actual transaction ID will occur on I am having trouble returning meaningful results.
My code looks like this, but is returning 0 results, as all the transaction Ids are NULL values, since they do not happen on rows where the page path meets the AND hits.page.pagePath LIKE "%clear-out%"
condition:
SELECT hits.transaction.transactionId AS orderid
FROM `xxx.xxx.ga_sessions_20*` AS t
CROSS JOIN UNNEST(hits) AS hits
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
AND totals.transactions > 0
AND hits.page.pagePath LIKE "%clear-out%"
AND hits.transaction.transactionId IS NOT NULL
How can I say, for example, return the transaction Ids for all sessions where the user viewed AND hits.page.pagePath LIKE "%clear-out%"
?
Upvotes: 1
Views: 601
Reputation: 4736
When cross joining, you're repeating the whole session for each hit. Use this nested info per hit to look for your page - not the cross joined hits. You're unfortunately giving both the same name. It's better to keep them seperate - here's what it could look like:
SELECT
h.transaction.transactionId AS orderId
--,ARRAY( (SELECT AS STRUCT hitnumber, page.pagePath, transaction.transactionId FROM t.hits ) ) AS hitInfos -- test: show all hits in this session
FROM
`google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` AS t
CROSS JOIN t.hits AS h
WHERE
totals.transactions > 0 AND h.transaction.transactionId IS NOT NULL
AND
-- use the repeated hits nest (not the cross joined 'h') to check all pagePaths in the session
(SELECT LOGICAL_OR(page.pagePath LIKE "/helmets/%") FROM t.hits )
LOGICAL_OR()
is an aggregation function for OR
- so if any hit matches the condition it returns TRUE
(This query uses the openly available GA data from Google. It's a bit old but good to play around with.)
Upvotes: 2