Ben P
Ben P

Reputation: 3369

Show transactions from a user who saw X page/s in their session

I am working with Google Analytics data in BigQuery.

I'd like to show a list of transaction ids from users who visited a particular page on a website in their session, I've unnested hits.page.pagepath in order to identify a particular page, but since I don't know which row the actual transaction ID will occur on I am having trouble returning meaningful results.

My code looks like this, but is returning 0 results, as all the transaction Ids are NULL values, since they do not happen on rows where the page path meets the AND hits.page.pagePath LIKE "%clear-out%" condition:

SELECT hits.transaction.transactionId AS orderid
FROM `xxx.xxx.ga_sessions_20*` AS t
  CROSS JOIN UNNEST(hits) AS hits
WHERE parse_date('%y%m%d', _table_suffix) between 
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
AND totals.transactions  > 0
AND hits.page.pagePath LIKE "%clear-out%"
AND hits.transaction.transactionId IS NOT NULL

How can I say, for example, return the transaction Ids for all sessions where the user viewed AND hits.page.pagePath LIKE "%clear-out%"?

Upvotes: 1

Views: 601

Answers (1)

Martin Weitzmann
Martin Weitzmann

Reputation: 4736

When cross joining, you're repeating the whole session for each hit. Use this nested info per hit to look for your page - not the cross joined hits. You're unfortunately giving both the same name. It's better to keep them seperate - here's what it could look like:

SELECT
  h.transaction.transactionId AS orderId
  --,ARRAY( (SELECT AS STRUCT hitnumber, page.pagePath, transaction.transactionId FROM t.hits ) ) AS hitInfos -- test: show all hits in this session
FROM
  `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` AS t 
  CROSS JOIN t.hits AS h
WHERE
  totals.transactions > 0 AND h.transaction.transactionId IS NOT NULL
  AND
  -- use the repeated hits nest (not the cross joined 'h') to check all pagePaths in the session
  (SELECT LOGICAL_OR(page.pagePath LIKE "/helmets/%") FROM t.hits )

LOGICAL_OR() is an aggregation function for OR - so if any hit matches the condition it returns TRUE

(This query uses the openly available GA data from Google. It's a bit old but good to play around with.)

Upvotes: 2

Related Questions