Mia
Mia

Reputation: 3

SQL - Output a daily user status table from a user activity table

I have a table that has users' activiies on their membership (activate or deactivate) and when those activities happened. The activity table look like this:

user_id activity date
123 activate 06/01/2024
123 deactivate 06/15/2024
123 activate 06/20/2024
123 deactivate 06/30/2024
456 activate 06/25/2024
123 deactivate 07/08/2024
123 activate 07/10/2024

There is another table called dim_date that basically has one row for each date until today like this:

date
06/01/2024
06/02/2024
06/03/2024
...
07/21/2024

What is the optimal way to write a SQL to return a daily status table that has one row for each user per day showing whether their membership status is active or inactive? The output table should look like:

user_id date membership_status
123 06/01/2024 active
123 06/02/2024 active
123 ... active
123 06/14/2024 active
123 06/15/2024 inactive
123 06/16/2024 inactive
123 ... inactive
123 06/19/2024 inactive
123 06/20/2024 active
123 06/21/2024 active
123 ... active
123 06/29/2024 active
123 06/30/2024 inactive
123 07/01/2024 inactive
123 ... inactive
123 07/21/2024 inactive

Currently, what I have is:

with cte as ( 
select   
a.user_id   
,a.activity   
,a.date as activity_date   
,dd.date   
,row_number() over (partition by a.user_id, dd.date order by a.date desc) as rn 
from activity a 
left join dim_date dd on a.date <= dd.date 
) 
select    
user_id   
,date   
,case when a.activity = "activate" then "active" else "inactive" end as membership_status 
from cte 
where rn = 1

Was wondering if anyone could provide something simpler? Possibly without cte/subquery?

Upvotes: 0

Views: 52

Answers (2)

Error_2646
Error_2646

Reputation: 3781

"Simpler" is subjective. The main concern with your original query isn't the complexity, it's not terribly complex. But it will have a large intermediate result set with a lot of redundancy for row_number() to reckon with only having the a.date <= dd.date condition.

Explicitly deriving an effective (start/end) range with lead() and joining on where the date falls within that range will probably be more performant.

Fiddle: https://dbfiddle.uk/8WcyWwOa

Note: coalesce() to '9999-12-31' is a bit of syntactic sugar just to force the dimension date to fall below the derived end of the user's last status.

CREATE TABLE activities (
    user_id INTEGER,
    activity VARCHAR(20),
    date DATE
);

INSERT INTO activities (user_id, activity, date) VALUES
(123, 'activate', '2024-06-01'),
(123, 'deactivate', '2024-06-15'),
(123, 'activate', '2024-06-20'),
(123, 'deactivate', '2024-06-30'),
(456, 'activate', '2024-06-25'),
(123, 'deactivate', '2024-07-08'),
(123, 'activate', '2024-07-10');

CREATE TABLE dim_date (
    date_value DATE
);

INSERT INTO dim_date (date_value)
SELECT generate_series('2024-06-01'::DATE, '2024-07-22'::DATE, '1 day'::INTERVAL)::DATE;

with assign_end_date_to_status as 
  (
   select a.*,
          coalesce(
            lead(a.date) over ( partition by a.user_id order by a.date) 
              - interval '1 DAY', '9999-12-31'::DATE)  as status_end
     from activities a
  )
select ads.user_id,
       dd.date_value,
       case when ads.activity = 'activate' then 'active'
            else 'inactive'
        end as membership_status
  from assign_end_date_to_status ads
 inner
  join dim_date dd
    on dd.date_value between ads.date and ads.status_end;

Upvotes: 0

Stefanov.sm
Stefanov.sm

Reputation: 13029

Maybe not really simpler but I hope clear and easy to read using a scalar subquery for membership_status.

select dd.date, user_id,
 (
   select case a.activity when 'activate' then 'active' when 'deactivate' then 'inactive' end
   from activity a
   where user_id = u.user_id and a.date <= dd.date
   order by a.date desc limit 1
 )  membership_status
from dim_date dd
cross join (select distinct user_id from activity) u
order by user_id, dd.date;

DB Fiddle demo

Upvotes: 0

Related Questions