Bikash Behera
Bikash Behera

Reputation: 444

How to selectively join calendar table with varying date ranges in another table to get the counts when there is no record?

I have a leads table that looks like below, showing which user (user_uuid) dropped which lead(lead_id) along with their subscription start and end date.

lead_id     lead_date       subs_start          subs_end            user_uuid           subs_id
--------    -----------     -----------         -----------         ------------------  ------
25914112    20-Mar-2019     05-Mar-2019         28-Mar-2019         5447ade6-1401-44bd  79926
25811730    06-Mar-2019     05-Mar-2019         28-Mar-2019         5447ade6-1401-44bd  79926
28597464    08-Mar-2019     05-Mar-2019         28-Mar-2019         5447ade6-1401-44bd  79926
28591844    08-Mar-2019     05-Mar-2019         28-Mar-2019         5447ade6-1401-44bd  79926
25914076    15-Jun-2019     15-Jun-2019         28-Jun-2019         00013889-00ce-4eb3  99927
26006221    17-Jun-2019     15-Jun-2019         28-Jun-2019         00013889-00ce-4eb3  99927
26095282    17-Jun-2019     15-Jun-2019         28-Jun-2019         00013889-00ce-4eb3  99927
26093874    19-Jun-2019     15-Jun-2019         28-Jun-2019         00013889-00ce-4eb3  99927
26091313    27-Jun-2019     15-Jun-2019         28-Jun-2019         00013889-00ce-4eb3  99927
25112857    27-Jun-2019     15-Jun-2019         28-Jun-2019         00013889-00ce-4eb3  99927

subs_start and subs_end are the subscription start and end date respectively.

I am trying to find out how many leads are dropped by each user on a daily basis, in the subscription period (between subscription start date and end date),in below format.

user_uuid           date          leads_dropped
------------------  -----------   ------------
5447ade6-1401-44bd  05-Mar-2019   0
5447ade6-1401-44bd  06-Mar-2019   1
5447ade6-1401-44bd  07-Mar-2019   0
5447ade6-1401-44bd  08-Mar-2019   2
5447ade6-1401-44bd  09-Mar-2019   0
00013889-00ce-4eb3  15-Jun-2019   0
00013889-00ce-4eb3  16-Jun-2019   0
00013889-00ce-4eb3  17-Jun-2019   2
00013889-00ce-4eb3  18-Jun-2019   0
00013889-00ce-4eb3  19-Jun-2019   0

I've tried using the calendar table, but reached no where. Please help me find me the number of leads for the dates only with in the subscription period of every user.

WITH grouped_daily AS (
SELECT user_uuid,lead_creation_date, COUNT(lead_id) cnt
FROM leads
GROUP BY 1,2)

SELECT *
FROM (SELECT dt::date FROM generate_series(date '2019-01-01', date '2019-09-14', INTERVAL '1 day') dt) dt_tbl 
        LEFT JOIN grouped_daily gd
        ON gd.lead_creation_date = dt_tbl.dt

Upvotes: 1

Views: 37

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

From what I understand, you need to generate separate dates for each user. So, aggregate at the user level to get the bounds. Then cross join to the dates and bring back in the leads information:

WITH u AS (
      SELECT user_uuid, MIN(subs_start) as min_ss, MAX(subs_end) as max_ss
      FROM leads
      GROUP BY user_uuid
     )
SELECT u.user_uuid, d.dt, COUNT(l.user_uuid)
FROM u CROSS JOIN LATERAL
     generate_series(min_ss, max_ss, INTERVAL '1 day') d(dt) LEFT JOIN
     leads l
     ON l.user_uuid = u.user_uuid AND
        l.lead_date = d.dt
GROUP BY u.user_uuid, d.dt;

EDIT:

A lateral join just makes the syntax more consistent with SQL -- I am not a big fan of calling table-generating functions in the SELECT:

WITH u AS (
      SELECT user_uuid, MIN(subs_start) as min_ss, MAX(subs_end) as max_ss
      FROM leads
      GROUP BY user_uuid
     )
SELECT u.user_uuid, u.dt, COUNT(l.user_uuid)
FROM (SELECT  u.*,
              generate_series(min_ss, max_ss, INTERVAL '1 day') as dt
      FROM u 
     ) LEFT JOIN
     leads l
     ON l.user_uuid = u.user_uuid AND
        l.lead_date = u.dt
GROUP BY u.user_uuid, u.dt;

Upvotes: 1

Related Questions