Reputation: 1952
I'm trying to translate the logic below into one or more SQL statements:
Finding the first event is easy, as is excluding all events that occur in the 30 days after it. I'm having trouble trying to identify all the following windows/events after the first without looping or scripting.
My solution needs to support multiple RDBMS (MSSQL and Spark SQL at this time, potentially more), so the solution needs to be as close to standard SQL as possible, no platform-specific scripting. I also would like to avoid correlated subqueries if possible for performance reasons. But if that's the only way to solve the problem, I'm open to that as a solution.
Example data:
UserID EventDate
1 2022-01-02
1 2022-01-19
1 2022-02-01
1 2022-02-07
1 2022-02-08
1 2022-03-19
2 2022-01-04
2 2022-01-05
2 2022-01-06
2 2022-02-22
Desired output:
UserID EventDate Include
1 2022-01-02 1
1 2022-01-19 0
1 2022-02-01 0
1 2022-02-07 1
1 2022-02-08 0
1 2022-03-19 1
2 2022-01-04 1
2 2022-01-05 0
2 2022-01-06 0
2 2022-02-22 1
Or:
UserID EventDate
1 2022-01-02
1 2022-02-07
1 2022-03-19
2 2022-01-04
2 2022-02-22
Upvotes: 0
Views: 132
Reputation: 4937
This should work on most DB.
drop table if exists #have;
create table #have
(
UserID [int]
, date [date]
)
;
insert into #have
values
(1, '2022-01-02')
, (1, '2022-01-19')
, (1, '2022-02-01')
, (1, '2022-02-07')
, (1, '2022-02-08')
, (1, '2022-03-19')
, (2, '2022-01-04')
, (2, '2022-01-05')
, (2, '2022-01-06')
, (2, '2022-02-22')
;
with c1 as
(
select *
, isnull(datediff(day, lag(date, 1) over(partition by UserID
order by date), date), 0) as diff
from #have
)
,
c2 as
(
select *, sum(diff) over(partition by UserID
order by date) as s
, sum(diff) over(partition by UserID
order by date) / 31 + 1 as thirty
from c1
)
,
c3 as
(
select UserID
, min(date) as date
from c2
group by UserID, thirty
)
select * from c3
order by UserID, date
;
Upvotes: 1