user2903089
user2903089

Reputation: 313

SQLSERVER 2014 rolling distinct count for n days across between two dates

I have a table similar to the below

UsedDate, ProductID, Count, usedBy
10/10/2017, Widgit1, 14, adr
10/10/2017, Widgit1, 20, mmg
10/10/2017, Widgit3, 5, mmg
11/10/2017, widgit2, 1, adr
11/10/2017, widgit1, 15, adr
11/10/2017, widgit2, 15, mmg

13/10/2017, widgit2, 8, adr
13/10/2017, widgit3, 5, adr
13/10/2017, widgit3, 4, mmg

Now for a rolling period in days (@period) between two dates (@startDate, @endDate) I need to calculate the distinct count of ProductID used, so for 10/10 its 2 (widget1 and widget 3) for 11/10 its 2(widget1, widget2) but the rolling count would be 3 along with the start date of the period. However some days might be missing as if nothing was used that day no entries have been stored in the database.

Typically the start date and end date will be a year apart.

I realize I can get the totals by

SELECT
    Cast (dbo.Usage.UsedDate as Date) AS  UsedDate,  
    count (distinct dbo.Usage.ProductID) AS used
    FROM 
        dbo.Usage
    WHERE (dbo.Usage.UsedDate BETWEEN (@StartDate) AND (@EndDate))
    GROUP BY Cast (dbo.Usage.UsedDate as Date)

But how can I partition the rolling date rages and get a table of the dates with the aggregate distinct counts of Products?

So if the startDate is 10/10/2017 the endDate is 13/10/2017 the period is 2 days

That means that the first row will be for 10/10/2017 and the next row 10/10/2017 data + 11/10/2017 data.
However the third row is just the data from the 13/10/2017 as its in the next period if there had of been a 12/10/2017 it would have been in that new period too

 UsedDate, within_day_distinct_count, roll_distinct_count
  10/10/2017 2, 2
  11/10/2017 2, 3
  13/10/2017 2, 2

Upvotes: 0

Views: 80

Answers (2)

sepupic
sepupic

Reputation: 8687

If I understood you well, all the period is divided into 3 parts: before @startDate, BETWEEN @StartDate AND @EndDate, and after @EndDate.

In your example only 2 periods are presented: between and after.

In every period running distinct count have to be calculated separately, so here is the code with union all for all 3 periods:

declare @t table (UsedDate date, ProductID varchar(100), Cnt int, usedBy varchar(100));

insert into @t values

('20171010', 'Widgit1', 14, 'adr'),
('20171010', 'Widgit1', 20, 'mmg'),
('20171010', 'Widgit3', 5, 'mmg'),
('20171011', 'Widgit2', 1, 'adr'),
('20171011', 'Widgit1', 15, 'adr'),
('20171011', 'Widgit2', 15, 'mmg'),

('20171013', 'Widgit2', 8, 'adr'),
('20171013', 'Widgit3', 5, 'adr'),
('20171013', 'Widgit3', 4, 'mmg');

declare @startDate date = '20171010', @endDate date = '20171011'

select t.UsedDate,
       count(distinct t.ProductId) as within_day_distinct_count,
       count(distinct t1.ProductId) as roll_distinct_count
from @t t join @t t1
        on t.UsedDate >= t1.UsedDate 
WHERE t.UsedDate BETWEEN @StartDate AND @EndDate
and   t1.UsedDate BETWEEN @StartDate AND @EndDate
group by t.UsedDate

union all 

select t.UsedDate,
       count(distinct t.ProductId) as within_day_distinct_count,
       count(distinct t1.ProductId) as roll_distinct_count
from @t t join @t t1
        on t.UsedDate >= t1.UsedDate 
WHERE t.UsedDate > @EndDate
and   t1.UsedDate > @EndDate
group by t.UsedDate

union all 

select t.UsedDate,
       count(distinct t.ProductId) as within_day_distinct_count,
       count(distinct t1.ProductId) as roll_distinct_count
from @t t join @t t1
        on t.UsedDate >= t1.UsedDate 
WHERE t.UsedDate < @StartDate
and   t1.UsedDate < @StartDate
group by t.UsedDate;

Upvotes: 1

You can use a Tally to create the date range.

A good practice is to create a View to the tally table.

DECLARE @Usage as table (UsedDate date , ProductID varchar(10) , Count int , usedBy varchar(5)) 
INSERT @Usage(UsedDate, ProductID, Count, usedBy) VALUES 
 ('20171010', 'Widgit1', 14, 'adr')
,('20171010', 'Widgit1', 20, 'mmg')
,('20171010', 'Widgit3', 5,  'mmg')
,('20171011', 'widgit2', 1,  'adr')
,('20171011', 'widgit1', 15, 'adr')
,('20171011', 'widgit2', 15, 'mmg')
,('20171013', 'widgit2', 8,  'adr')
,('20171013', 'widgit3', 5,  'adr')
,('20171013', 'widgit3', 4,  'mmg')


DECLARE @StartDate date = '20171010'
DECLARE @EndDate date  = '20171013'

;WITH lv0(N) AS (SELECT 0 FROM (VALUES (1),(1))G(N))
,lv1(N) AS (SELECT 0 FROM lv0 a CROSS JOIN lv0 b) -- 4
,lv2(N) AS (SELECT 0 FROM lv1 a CROSS JOIN lv1 b) -- 16
,lv3(N) AS (SELECT 0 FROM lv2 a CROSS JOIN lv2 b) -- 256
,lv4(N) AS (SELECT 0 FROM lv3 a CROSS JOIN lv3 b) -- 65,536
,lv5(N) AS (SELECT 0 FROM lv4 a CROSS JOIN lv4 b) -- 4,294,967,296
,cteTally(N) AS
(
SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5
)
,
DateRange AS ( 
SELECT DATEADD(day,N -1 ,@StartDate) DR 
FROM cteTally where N <= DATEDIFF(Day, @StartDate , @EndDate) + 1 
) 


,ProductByday AS (SELECT DISTINCT
    T.DR AS  UsedDate,  
    U.ProductID  used
    FROM 
    DateRange T
    LEFT JOIN @Usage U
    ON
    T.DR = Cast ( U.UsedDate as date)
  )

SELECT Distinct A.UsedDate , Count(Distinct B.used) used  from ProductByday A 
LEFT JOIN ProductByday B
ON A.UsedDate >= B.UsedDate
GROUP BY 
A.UsedDate

Result

UsedDate   used
---------- -----------
2017-10-10 2
2017-10-11 3
2017-10-12 3
2017-10-13 3

Upvotes: 0

Related Questions