Reputation: 172
Using Google Analytics data in Google BigQuery, I can derive the number of sessions that include a hit on the /confirm/ page with:
#standardSQL
SELECT date AS Date, COUNT(Date) AS Sessions
FROM (
SELECT date
FROM `123456789.ga_sessions_20161202`
CROSS JOIN UNNEST(hits) as hit
WHERE hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$')
GROUP BY VisitId, fullVisitorId, date
)
GROUP BY Date
ORDER BY Date ASC, Sessions ASC;
How about if I wanted to show the number of sessions that hit both the /confirm/ page and the /payment/ page? What should my SQL look like?
Upvotes: 0
Views: 507
Reputation: 172944
Try below, should give you an idea
#standardSQL
SELECT DATE, COUNT(1) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE (SELECT COUNT(1) FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE'
AND REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') ) > 0
AND (SELECT COUNT(1) FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE'
AND REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') ) > 0
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;
Above can be further optimized as below
#standardSQL
SELECT DATE, COUNT(1) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE (SELECT COUNTIF(REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$')) *
COUNTIF(REGEXP_CONTAINS(hit.page.pagePath, '/payment/$'))
FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE') > 0
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;
Upvotes: 1
Reputation: 33705
The most efficient way to do this is to use a single subquery in the WHERE
clause that checks for both hit types. For example,
#standardSQL
SELECT DATE, COUNT(*) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE ((SELECT COUNTIF(hit.page.pagePath LIKE '%confirm/$') > 0 AND
COUNTIF(hit.page.pagePath LIKE '%/payment/$%') > 0
FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE'))
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;
Upvotes: 2
Reputation: 11777
I tested this query in our GA
dataset and it might work for you:
#standardSQL
SELECT
date,
COUNT(DISTINCT CONCAT(fv, CAST(v AS string))) sessions
FROM(
SELECT
date,
fullvisitorid fv,
visitid v,
CASE WHEN (MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') THEN TRUE END) AND MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') THEN TRUE END)) THEN TRUE END flag
FROM `dafiti-analytics.40663402.ga_sessions_20170102`,
UNNEST(hits) hit
WHERE 1 = 1
AND hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, r'/confirm/$|/payment/$')
GROUP BY fv, v, date
HAVING flag IS NOT NULL
)
GROUP BY
date
So first thing is that I selected all users and their sessions that contained either confirm or payment in their hits.page.pagepath
field.
After that, I used the MAX
operation and grouped by the visitors and their sessions to find when both confirm
and payment
happened, like so:
CASE WHEN (MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') THEN TRUE END) AND MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') THEN TRUE END)) THEN TRUE END flag
flag
is true when a given visitor on a given session had both confirm
and payment
in their navigation.
And then just COUNT DISTINCT
the concatenation of visitors and their sessions to get total sessions (this is so because the visitid
is not unique among visitors).
Upvotes: 0
Reputation: 4474
try something like this, assuming a Regex
WHERE hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, '(/confirm/$)|(/payment/$)')
Upvotes: 0