UKDataGeek
UKDataGeek

Reputation: 6882

Building a basic funnel using Google Big Query

I noticed that there are a lot of users of Google Analytics with Google BigQuery but the documentation is quite limited. Is it possible to help generate a simple funnel that shows Users who visited /pageA then /pageB and then /pageC

I have seen lots of different approaches - and I am not clear what the "correct" way is to do this.

Upvotes: 1

Views: 4784

Answers (2)

Martin Weitzmann
Martin Weitzmann

Reputation: 4736

You can concatenate hits of users first with array_concat_agg() and then do your calculations based on the new user scoped table. It highly depends on your chosen time frame of course.

Here for instance with dummy data from Google:

#standardSQL
WITH arrAgg AS (
  SELECT
    fullvisitorid,
    -- concatenate arrays over multiple sessions
    ARRAY_CONCAT_AGG(hits ORDER BY visitstarttime ASC) userHits
  FROM
    `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
  GROUP BY 1
)
, journey AS (
  SELECT 
    fullvisitorId,
    -- get a proper running index with combination of unnest and offset of aggregated hits array
    ARRAY( (SELECT AS STRUCT index+1 as hitNumber, page FROM UNNEST(userHits) WITH OFFSET AS index)) as hits
  FROM arrAgg
)

SELECT * FROM journey

When you run this you can see the new "raw material". In the first step I concatenate the hits, in the second step I make a proper index for the pages and put eerything back into a "hits" array.

You can build your user journey using cross joins and comparing steps and sequence of the page:

#standardSQL
WITH arrAgg AS (
  SELECT
    fullvisitorid,
    SUM(totals.visits) sessions,
    -- concatenate arrays over multiple sessions
    ARRAY_CONCAT_AGG(hits ORDER BY visitstarttime ASC) userHits
  FROM
    `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
  GROUP BY 1
)
, journey AS (
  SELECT 
    fullvisitorId,
    sessions,
    -- get a proper running index with combination of unnest and offset of aggregated hits array
    ARRAY( (SELECT AS STRUCT index+1 as hitNumber, page FROM UNNEST(userHits) WITH OFFSET AS index WHERE type='PAGE')) as hits
  FROM arrAgg
)
-- funnel: homepage: /, login: /login.html, basket: /basket.html, confirm: /confirm.html
SELECT 
  SUM(sessions) allSessions,
  COUNT(1) allUsers,
  -- check if any page was home page
  SUM( (SELECT IF( LOGICAL_OR(page.pagePath='/'), 1, 0) FROM j.hits) ) step1_home,
  -- cross join hits array with itself: combination of all pages with all pages: any of those combinations our two pages? came home before login?: if yes for any given amount add up 1
  SUM( (SELECT IF( LOGICAL_OR(a.page.pagePath='/' AND b.page.pagePath='/login.html' AND a.hitNumber < b.hitNumber) ,1, 0 ) FROM j.hits a CROSS JOIN j.hits b) ) step2_login,
  -- extend cross join principle to a third page
  SUM( (SELECT IF( LOGICAL_OR(
      a.page.pagePath='/' AND b.page.pagePath='/login.html' AND c.page.pagePath='/basket.html' AND
      a.hitNumber < b.hitNumber AND b.hitNumber < c.hitNumber 
      ) ,1, 0 ) FROM j.hits a CROSS JOIN j.hits b CROSS JOIN j.hits c) ) step3_basket,
  -- extend cross join principle to a fourth page
  SUM( (SELECT IF( LOGICAL_OR(
      a.page.pagePath='/' AND b.page.pagePath='/login.html' AND c.page.pagePath='/basket.html' AND d.page.pagePath='/confirm.html' AND
      a.hitNumber < b.hitNumber AND b.hitNumber < c.hitNumber AND c.hitNumber < d.hitNumber
      ) ,1, 0 ) FROM j.hits a CROSS JOIN j.hits b CROSS JOIN j.hits c CROSS JOIN j.hits d) ) step4_confirm
FROM journey j

Since everything operates with subqueries on arrays it should scale quite well, because of parallelization. Please test it before using it - I didn't ;) But it should point in the right direction.

Upvotes: 3

citrus
citrus

Reputation: 21

Check here: https://online-behavior.com/analytics/funnel-analysis

Alternatively, if you want to do it by hand:

  1. Select all sessions landing on start page (record smallest hit number)
  2. LEFT JOIN with select all sessions on next page where hit number is greater than hit number from start page (match session IDs)
  3. Repeat step 2 until the funnel is complete
  4. Count all sessions grouped by page name to aggregate

Upvotes: 2

Related Questions