Reputation: 3
I have a table that has users' activiies on their membership (activate or deactivate) and when those activities happened. The activity table look like this:
user_id | activity | date |
---|---|---|
123 | activate | 06/01/2024 |
123 | deactivate | 06/15/2024 |
123 | activate | 06/20/2024 |
123 | deactivate | 06/30/2024 |
456 | activate | 06/25/2024 |
123 | deactivate | 07/08/2024 |
123 | activate | 07/10/2024 |
There is another table called dim_date that basically has one row for each date until today like this:
date |
---|
06/01/2024 |
06/02/2024 |
06/03/2024 |
... |
07/21/2024 |
What is the optimal way to write a SQL to return a daily status table that has one row for each user per day showing whether their membership status is active or inactive? The output table should look like:
user_id | date | membership_status |
---|---|---|
123 | 06/01/2024 | active |
123 | 06/02/2024 | active |
123 | ... | active |
123 | 06/14/2024 | active |
123 | 06/15/2024 | inactive |
123 | 06/16/2024 | inactive |
123 | ... | inactive |
123 | 06/19/2024 | inactive |
123 | 06/20/2024 | active |
123 | 06/21/2024 | active |
123 | ... | active |
123 | 06/29/2024 | active |
123 | 06/30/2024 | inactive |
123 | 07/01/2024 | inactive |
123 | ... | inactive |
123 | 07/21/2024 | inactive |
Currently, what I have is:
with cte as (
select
a.user_id
,a.activity
,a.date as activity_date
,dd.date
,row_number() over (partition by a.user_id, dd.date order by a.date desc) as rn
from activity a
left join dim_date dd on a.date <= dd.date
)
select
user_id
,date
,case when a.activity = "activate" then "active" else "inactive" end as membership_status
from cte
where rn = 1
Was wondering if anyone could provide something simpler? Possibly without cte/subquery?
Upvotes: 0
Views: 52
Reputation: 3781
"Simpler" is subjective. The main concern with your original query isn't the complexity, it's not terribly complex. But it will have a large intermediate result set with a lot of redundancy for row_number() to reckon with only having the a.date <= dd.date
condition.
Explicitly deriving an effective (start/end) range with lead()
and joining on where the date falls within that range will probably be more performant.
Fiddle: https://dbfiddle.uk/8WcyWwOa
Note: coalesce()
to '9999-12-31' is a bit of syntactic sugar just to force the dimension date to fall below the derived end of the user's last status.
CREATE TABLE activities (
user_id INTEGER,
activity VARCHAR(20),
date DATE
);
INSERT INTO activities (user_id, activity, date) VALUES
(123, 'activate', '2024-06-01'),
(123, 'deactivate', '2024-06-15'),
(123, 'activate', '2024-06-20'),
(123, 'deactivate', '2024-06-30'),
(456, 'activate', '2024-06-25'),
(123, 'deactivate', '2024-07-08'),
(123, 'activate', '2024-07-10');
CREATE TABLE dim_date (
date_value DATE
);
INSERT INTO dim_date (date_value)
SELECT generate_series('2024-06-01'::DATE, '2024-07-22'::DATE, '1 day'::INTERVAL)::DATE;
with assign_end_date_to_status as
(
select a.*,
coalesce(
lead(a.date) over ( partition by a.user_id order by a.date)
- interval '1 DAY', '9999-12-31'::DATE) as status_end
from activities a
)
select ads.user_id,
dd.date_value,
case when ads.activity = 'activate' then 'active'
else 'inactive'
end as membership_status
from assign_end_date_to_status ads
inner
join dim_date dd
on dd.date_value between ads.date and ads.status_end;
Upvotes: 0
Reputation: 13029
Maybe not really simpler but I hope clear and easy to read using a scalar subquery for membership_status
.
select dd.date, user_id,
(
select case a.activity when 'activate' then 'active' when 'deactivate' then 'inactive' end
from activity a
where user_id = u.user_id and a.date <= dd.date
order by a.date desc limit 1
) membership_status
from dim_date dd
cross join (select distinct user_id from activity) u
order by user_id, dd.date;
Upvotes: 0