Rocco
Rocco

Reputation: 153

How to find inactive users in ga4_obfuscated_sample_ecommerce public dataset

I recently started using GCP and am learning how to work with the public datasets.

I am trying to identify N day - inactive users with SQL.

You can find the dataset here: https://developers.google.com/analytics/bigquery/web-ecommerce-demo-dataset

going through the documentation I found the following code.

/**
 * Builds an audience of N-Day Inactive Users.
 *
 * N-Day inactive users = users in the last M days who have not logged one  
 * event with event param engagement_time_msec > 0 in the last N days 
 *  where M > N.
 */

 
SELECT
  COUNT(DISTINCT MDaysUsers.user_id) AS n_day_inactive_users_count
FROM
  (
    SELECT
      user_id
    FROM
      /* PLEASE REPLACE WITH YOUR TABLE NAME */
      `YOUR_TABLE.events_*` AS T
    CROSS JOIN
      T.event_params
    WHERE
      event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
      /* Has engaged in last M = 7 days */
      AND event_timestamp >
          UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY))
      /* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
      AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131'
  ) AS MDaysUsers
-- EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.
LEFT JOIN
  (
    SELECT
      user_id
    FROM
      /* PLEASE REPLACE WITH YOUR TABLE NAME */
      `YOUR_TABLE.events_*`AS T
    CROSS JOIN
      T.event_params
    WHERE
      event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
      /* Has engaged in last N = 2 days */
      AND event_timestamp >
          UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY))
      /* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
      AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131'
  ) AS NDaysUsers
  ON MDaysUsers.user_id = NDaysUsers.user_id
WHERE
  NDaysUsers.user_id IS NULL;

However, when replacing the user_id with user_pseudo_id, the YOUR_TABLE value with bigquery-public-data.ga4_obfuscated_sample_ecommerce and correcting the dates I get no results.

Is there a way to apply this code to the sample dataset?

Upvotes: 0

Views: 216

Answers (1)

Jose Gutierrez Paliza
Jose Gutierrez Paliza

Reputation: 1428

What I can see is that you have to erase the line of _TABLE_SUFFIX because the dataset of bigquery-public-data.ga4_obfuscated_sample_ecommerce has only one event table.

What I did was to get the maximum timestamp date to replace it with the CURRENT_TIMESTAMP() to correct the date.

SELECT MAX(TIMESTAMP_MICROS(event_timestamp)) as time
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*

After that I changed the values that you said by replacing user_id with user_pseudo_id and corrected the time and the table name.

Consider below approach:

SELECT
 COUNT(DISTINCT MDaysUsers.user_pseudo_id) AS n_day_inactive_users_count
FROM
 (
   SELECT
     user_pseudo_id
   FROM
     /* PLEASE REPLACE WITH YOUR TABLE NAME */
     `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` AS T
   CROSS JOIN
     T.event_params
   WHERE
     event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
     /* Has engaged in last M = 7 days */
     AND event_timestamp >
         UNIX_MICROS(TIMESTAMP_SUB("2021-01-31 23:59:55.412363", INTERVAL 7 DAY))
     /* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
 ) AS MDaysUsers
-- EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.
LEFT JOIN
 (
   SELECT
     user_pseudo_id
   FROM
     /* PLEASE REPLACE WITH YOUR TABLE NAME */
     `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`AS T
   CROSS JOIN
     T.event_params
   WHERE
     event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
     /* Has engaged in last N = 2 days */
     AND event_timestamp >
         UNIX_MICROS(TIMESTAMP_SUB("2021-01-31 23:59:55.412363", INTERVAL 2 DAY))
     /* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
 ) AS NDaysUsers
 ON MDaysUsers.user_pseudo_id = NDaysUsers.user_pseudo_id
WHERE
 NDaysUsers.user_pseudo_id IS NULL;

The result that it shows is the next:

enter image description here

Upvotes: 1

Related Questions