jetboy
jetboy

Reputation: 172

Page combinations in BigQuery's Google Analytics data

Using Google Analytics data in Google BigQuery, I can derive the number of sessions that include a hit on the /confirm/ page with:

#standardSQL
SELECT date AS Date, COUNT(Date) AS Sessions
FROM (
  SELECT date
  FROM `123456789.ga_sessions_20161202`
  CROSS JOIN UNNEST(hits) as hit
  WHERE hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$')
  GROUP BY VisitId, fullVisitorId, date
)
GROUP BY Date
ORDER BY Date ASC, Sessions ASC;

How about if I wanted to show the number of sessions that hit both the /confirm/ page and the /payment/ page? What should my SQL look like?

Upvotes: 0

Views: 507

Answers (4)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Try below, should give you an idea

#standardSQL
SELECT DATE, COUNT(1) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE (SELECT COUNT(1) FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE' 
                 AND REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') ) > 0
AND (SELECT COUNT(1) FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE' 
                 AND REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') ) > 0
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;

Above can be further optimized as below

#standardSQL
SELECT DATE, COUNT(1) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE (SELECT COUNTIF(REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$')) *
          COUNTIF(REGEXP_CONTAINS(hit.page.pagePath, '/payment/$'))
        FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE') > 0
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;

Upvotes: 1

Elliott Brossard
Elliott Brossard

Reputation: 33705

The most efficient way to do this is to use a single subquery in the WHERE clause that checks for both hit types. For example,

#standardSQL
SELECT DATE, COUNT(*) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE ((SELECT COUNTIF(hit.page.pagePath LIKE '%confirm/$') > 0 AND
          COUNTIF(hit.page.pagePath LIKE '%/payment/$%') > 0
        FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE'))
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;

Upvotes: 2

Willian Fuks
Willian Fuks

Reputation: 11777

I tested this query in our GA dataset and it might work for you:

#standardSQL
SELECT
date,
COUNT(DISTINCT CONCAT(fv, CAST(v AS string))) sessions
FROM(
  SELECT 
  date,
  fullvisitorid fv,
  visitid v,
  CASE WHEN (MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') THEN TRUE END) AND MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') THEN TRUE END)) THEN TRUE END flag
  FROM `dafiti-analytics.40663402.ga_sessions_20170102`,
  UNNEST(hits) hit
  WHERE 1 = 1
  AND hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, r'/confirm/$|/payment/$')
  GROUP BY fv, v, date
  HAVING flag IS NOT NULL
)
GROUP BY 
    date

So first thing is that I selected all users and their sessions that contained either confirm or payment in their hits.page.pagepath field.

After that, I used the MAX operation and grouped by the visitors and their sessions to find when both confirm and payment happened, like so:

 CASE WHEN (MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') THEN TRUE END) AND MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') THEN TRUE END)) THEN TRUE END flag

flag is true when a given visitor on a given session had both confirm and payment in their navigation.

And then just COUNT DISTINCT the concatenation of visitors and their sessions to get total sessions (this is so because the visitid is not unique among visitors).

Upvotes: 0

Squiggs.
Squiggs.

Reputation: 4474

try something like this, assuming a Regex

 WHERE hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, '(/confirm/$)|(/payment/$)') 

Upvotes: 0

Related Questions