d_-
d_-

Reputation: 1491

Accessing Struct(s) and Array(s) in Firebase Closed Funnels through BigQuery

I stumbled unto this standard SQL BigQuery documentation this week, which got me started with a Firebase Analytics Closed Funnel. I however got the wrong results (view image below). There should be no users that had a "Tutorial_LessonCompleted" before they did not start a "Tutorial_LessonStarted >> Lesson = 1 " first. This could be because of various reasons.

Questions:

  1. Is it wise to use the User Property = "first_open_time", or is it better to use the Event = "first_open". How would the latter implementation look like ?
  2. I suspect I am perhaps not correctly drilling down to: Event (String = "Tutorial_LessonStarted") >> parameter (String = "LessonNumber") >> value (String = "lesson1")?
  3. How would a filter on _TABLE_SUFFIX = '20170701' possibly work, I read this will be cheaper. Any optimised code suggestions are received with open arms and an up-vote!

    #standardSQL
    SELECT
      step1, step2, step3, step4, step5, step6,
      COUNT(*) AS funnel_count,
      COUNT(DISTINCT user_id) AS users 
    FROM (
    SELECT
        user_dim.app_info.app_instance_id AS user_id,
        event.timestamp_micros AS event_timestamp,
        event.name AS step1,
        LEAD(event.name, 1) OVER (
          PARTITION BY user_dim.app_info.app_instance_id 
          ORDER BY event.timestamp_micros ASC) as step2,
        LEAD(event.name, 2) OVER (
          PARTITION BY user_dim.app_info.app_instance_id 
          ORDER BY event.timestamp_micros ASC) as step3,
        LEAD(event.name, 3) OVER (
          PARTITION BY user_dim.app_info.app_instance_id 
          ORDER BY event.timestamp_micros ASC) as step4,
        LEAD(event.name, 4) OVER (
          PARTITION BY user_dim.app_info.app_instance_id 
          ORDER BY event.timestamp_micros ASC) as step5,
        LEAD(event.name, 5) OVER (
          PARTITION BY user_dim.app_info.app_instance_id 
          ORDER BY event.timestamp_micros ASC) as step6
    FROM
        `......`, 
        UNNEST(event_dim) AS event, 
        UNNEST(user_dim.user_properties) AS user_prop  
    WHERE user_prop.key = "first_open_time"
    ORDER BY 1, 2, 3, 4, 5 ASC
    )
    WHERE step6 = "Tutorial_LessonStarted" AND EXISTS (
    SELECT * 
    FROM `......`, 
    UNNEST(event_dim) AS event,
    UNNEST(event.params)
    WHERE key = 'LessonNumber' AND value.string_value = "lesson1") GROUP BY step1, step2, step3, step4, step5, step6
    ORDER BY funnel_count DESC
    LIMIT 100;
    

Note:

  1. Enter your query table FROM, i.e:project_id.com_game_example_IOS.app_events_20170212,
  2. I left out the funnel_count and user_count.

Output:

Output

----------------------------------------------------------

Update since original question above:

@Elliot: I don’t understand why you said: -- ensure that an event with lesson1 precedes Tutorial_LessonStarted.

Tutorial_LessonStarted has a parameter "LessonNumber" with values lesson1,lesson2,lesson3,lesson4.

I want to count all funnels that took place with a last step in the funnel equal to LessonNumber=lesson1.

So, applied to event log-data for a brand new user's first session (aka: an user that fired first_open_time), the answer would be the table below:

So it is important to firstly get all the users that had a first_open_time on a specific day, as well structure the events into a funnel so that the last event in the funnel is one which matches an event and a specific parameter value, and then form the funnel "backwards" from there.

enter image description here

Upvotes: 1

Views: 1238

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33765

Let me go through some explanation, then see if I can suggest a query to get you started.

It looks like you want to analyze the sequence of events in your analytics data, but the sequence is already there for you--you have an array of the events. Looking at the Firebase schema for BigQuery, event_dim is the relevant column, and unless I'm misunderstanding something, these events are ordered by time. If you want to check what the sixth event's name was, you can use:

event_dim[SAFE_ORDINAL(6)].name

This will evaluate to NULL if there were fewer than six events, or else it will give you the string with the event name.

Another observation is that you are attempting to analyze both event_dim and user_dim, but you are taking the cross product of the two, which will explode the number of rows and make it hard to reason about the results of the query. To look for a specific user property, use an expression of this form:

(SELECT value.value.string_value
 FROM UNNEST(user_dim.user_properties)
 WHERE key = 'first_open_time') = '<expected property value>'

Combining these two filters, your FROM and WHERE clause would look something like this:

FROM `project_id.com_game_example_IOS.app_events_*`
WHERE _TABLE_SUFFIX = '20170701' AND
  event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
  (SELECT value.value.string_value
   FROM UNNEST(user_dim.user_properties)
   WHERE key = 'first_open_time') = '<expected property value>'

Using the bracket operator to access the steps from event_dim, we can do something like this:

WITH FilteredInput AS (
  SELECT *
  FROM `project_id.com_game_example_IOS.app_events_*`
  WHERE _TABLE_SUFFIX = '20170701' AND
    event_dim[SAFE_ORDINAL(6)].name = 'Tutorial_LessonStarted' AND
    (SELECT value.value.string_value
     FROM UNNEST(user_dim.user_properties)
     WHERE key = 'first_open_time') = '<expected property value>' AND
    -- ensure that an event with lesson1 precedes Tutorial_LessonStarted
    EXISTS (
      SELECT 1
      FROM UNNEST(event_dim) WITH OFFSET event_offset
      CROSS JOIN UNNEST(params)
      WHERE key = 'LessonNumber' AND
        value.string_value = 'lesson1' AND
        event_offset < 5
    )
)
SELECT
  event_dim[ORDINAL(1)].name AS step1,
  event_dim[ORDINAL(2)].name AS step2,
  event_dim[ORDINAL(3)].name AS step3,
  event_dim[ORDINAL(4)].name AS step4,
  event_dim[ORDINAL(5)].name AS step5,
  event_dim[ORDINAL(6)].name AS step6,
  COUNT(*) AS funnel_count,
  COUNT(DISTINCT user_dim.user_id) AS users
FROM FilteredInput
GROUP BY step1, step2, step3, step4, step5, step6;

This will return all unique "paths" along with a count and number of distinct users for each. Note that I'm just writing this off the top of my head--I don't have representative data that I can try it on--so there may be syntax or other errors.

Upvotes: 2

Related Questions