Alex Lizz
Alex Lizz

Reputation: 445

T-SQL aggregate window functions over specific time interval

Here's a SQL 2012 table:

CREATE TABLE [dbo].[TBL_BID]
(
    [ID] [varchar](max) NULL,
    [VALUE] [smallint] NULL,
    [DT_START] [date] NULL,
    [DT_FIN] [date] NULL
)

I can easily get last event's value, time since last event (or any specific lags) by LAG window function, as well as total number of events (or over specific number of past events), total average per user, etc

SELECT 
    ID,
    [VALUE],
    [DT_START], [DT_FIN],
    -- days since the end of last event
    DATEDIFF(d, LAG([DT_FIN], 1) OVER (PARTITION BY ID ORDER BY [DT_FIN]),
    [DT_START]) AS LAG1_DT,
    -- value of the last event
    LAG([VALUE], 1) OVER (PARTITION BY ID ORDER BY [DT_FIN]) AS LAG1_VALUE,
    -- number of events per id
    COUNT(ID) OVER (PARTITION BY ID) AS N,
    -- average [value] per id
    ROUND(AVG(CAST([VALUE] as float)) OVER (PARTITION BY ID), 1) AS VAL_AVG
FROM 
    TBL_BID

I am trying to get for events happened over specified time interval, i.e 10 days, 30 days, 180 days, etc, before the start date of each event

Something along the lines of:

COUNT(ID) OVER (PARTITION BY ID ORDER BY DT_FIN 
                RANGE BETWEEN DATEDIFF(d,-30,[DT_START]) AND [DT_START] )

UPDATE 4/19/2017: Some statistics About 20MM IDs, the time interval is 5 years, mean number of events per ID is 3.0. It could be 100+ events per ID, but majority has only handful of events, the distribution is very right skewed

Events_per_ID   Number_IDs
1   18676221
2   11254167
3   6992200
4   4487664
5   2933183
6   1957433
7   1330040
8   918873
9   644229
10  457858
........

Upvotes: 2

Views: 2944

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270011

The simplest approach is outer apply:

select . . .,
       b.cnt_30
from TBL_BID b outer apply
     (select count(*) as cnt_30
      from TBL_BID b2
      where b2.id = b.id and
            b2.dt_start >= dateadd(day, -30, b.dt_start) and
            b2.dt_start <= b.dt_start
     ) b;

This is not necessarily really efficient. You can readily extend it by adding more outer apply subqueries.

Upvotes: 1

David דודו Markovitz
David דודו Markovitz

Reputation: 44951

Need some more information, but the basic idea is to transform the windows functions type from range to rows by generating the full range of dates for each ID.

  • For each ID generate the relevant range of days (min(dt_start)-180 to max(dt_start))
  • Use the above row set as a base and LEFT JOIN TBL_BID on id and dt_fin (if (id,dt_fin) is not unique, aggregate first)
  • Use windows functions partition by id order by date rows between 180/30/10 preceding and current row

Upvotes: 0

Related Questions