Reputation: 28
What I'm trying to do in come up with a single query that can give the percentage of repeats within 30 days of an initial event, but only count any events within 30 days as a single repeat. Here's a sample data set for a single person:
Person Date
══════════════
A 3/1/14
A 3/21/14
A 3/29/14
A 4/14/14
A 4/17/14
In this case, 3/21 would be the repeat event, and 3/29 wouldn't be counted as a second. 4/14 would be the start of the next window, with 4/17 being the second repeat.
To calculate the percentage of repeats here, the numerator would be the distinct count of people who had an initial event in the month and also had a subsequent event within 30 days. The denominator is a distinct count of people with events in that month. In the case of crossing months, the repeat is counted within the month of the initial event.
I know I could come up with something that uses a loop/cursor or temp table, but as the data set grows, it's going to take forever. Does anyone have any thoughts on how to do this as a single query? It's probably going to involve a couple of CTE's. Everything I've come up with so far has failed.
Upvotes: 1
Views: 206
Reputation: 3145
Nice one... try this:
create table #t (Person varchar(10), EventDate date);
insert #t (Person, EventDate)
values
('A', '3/1/14'),
('A', '3/21/14'),
('A', '3/29/14'),
('A', '4/14/14'),
('A', '4/17/14'),
('A', '8/3/14'),
('B', '3/25/14'),
('B', '4/2/14'),
('B', '4/20/14'),
('B', '6/14/14'),
('B', '8/17/14'),
('B', '8/26/14');
;WITH OrderedEvents AS (
SELECT Person, EventDate, ROW_NUMBER() OVER (PARTITION BY Person ORDER BY EventDate) AS Ord
FROM #t
)
, RepeatedEvents AS (
SELECT Person, EventDate, Ord, EventDate AS InitialDate
FROM OrderedEvents
WHERE Ord = 1
UNION ALL
SELECT o.Person, o.EventDate, o.Ord
, CASE WHEN DATEDIFF(DAY, r.InitialDate, o.EventDate) > 30 THEN o.EventDate ELSE r.InitialDate END
FROM OrderedEvents o
JOIN RepeatedEvents r ON o.Person = r.Person AND o.Ord = r.Ord + 1
)
, GroupedEvents AS (
SELECT Person, MONTH(InitialDate) AS Mth, YEAR(InitialDate) AS Yr
, IsRepeat = CASE WHEN COUNT(*) > 1 THEN 1 ELSE 0 END
FROM RepeatedEvents
GROUP BY Person, MONTH(InitialDate), YEAR(InitialDate)
)
SELECT Mth, Yr, CAST(SUM(IsRepeat) AS NUMERIC) / CAST(COUNT(DISTINCT person) AS NUMERIC) AS Pct
FROM GroupedEvents
GROUP BY Mth, Yr;
Upvotes: 1