Patthebug
Patthebug

Reputation: 4797

Monthly retention in Amazon redshift

I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:

Query 1

SELECT EXTRACT(year FROM activity.created_at) AS Year,
       EXTRACT(month FROM activity.created_at) AS Month,
       COUNT(DISTINCT activity.member_id) AS active_users,
       COUNT(DISTINCT future_activity.member_id) AS retained_users,
       COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
  LEFT JOIN ads.fbs_page_view_staging AS future_activity
         ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
         Month
ORDER BY Year,
         Month

For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.

I modified the query as per another SO post and here it goes:

Query 2

WITH t AS (
   SELECT member_id
         ,date_trunc('month', created_at) AS month
         ,count(*) AS item_transactions
         ,lag(date_trunc('month', created_at)) OVER (PARTITION BY  member_id
                                           ORDER BY date_trunc('month', created_at)) 
          = date_trunc('month', created_at) - interval '1 month'
            OR NULL AS repeat_transaction
   FROM   ads.fbs_page_view_staging
   WHERE  created_at >= '2016-01-01'::date
   AND    created_at <  '2016-04-01'::date -- time range of interest.
   GROUP  BY 1, 2
   )
SELECT month
      ,sum(item_transactions) AS num_trans
      ,count(*) AS num_buyers
      ,count(repeat_transaction) AS repeat_buyers
      ,round(
          CASE WHEN sum(item_transactions) > 0
             THEN count(repeat_transaction) / sum(item_transactions) * 100
             ELSE 0
          END, 2) AS buyer_retention
FROM   t
GROUP  BY 1
ORDER  BY 1;

This query gives me the following error:

An error occurred when executing the SQL command:
WITH t AS (
   SELECT member_id
         ,date_trunc('month', created_at) AS month
         ,count(*) AS item_transactions
         ,lag(date_trunc('m...

[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details: 
 -----------------------------------------------
  error:  Interval values with month or year parts are not supported
  code:      8001
  context:   interval months: "1"
  query:     616822
  location:  cg_constmanager.cpp:145
  process:   padbmaster [pid=15116]
  -----------------------------------------------;

I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.

Any help would be much appreciated.

Upvotes: 2

Views: 2114

Answers (1)

Shiva
Shiva

Reputation: 651

Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.

-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
  ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;

-- 2771654 -- dist_ct

Upvotes: 1

Related Questions