Reputation: 37
I use SQL Server 2012. I have some table with daily history for delay sum, something like this:
SET DATEFORMAT YMD
GO
CREATE TABLE [dbo].[testsum](
[CID] [int],
[HDATE] [date],
[DELAYSUM] [numeric](16, 2)
)
GO
INSERT [dbo].[testsum] ([CID], [HDATE], [DELAYSUM]) VALUES
(223,'2016-10-16',15503.80)
,(223,'2016-10-17',15493.82)
,(223,'2016-10-18',15489.25)
,(223,'2016-10-19',15417.08)
,(427,'2016-10-01',10375.89)
,(427,'2016-10-02',10375.89)
,(427,'2016-10-03',10385.91)
,(427,'2016-10-16',8448.57)
,(427,'2016-10-17',8443.13)
,(427,'2016-10-18',8440.64)
,(427,'2016-10-19',8401.31)
,(427,'2016-10-20',8411.20)
,(427,'2016-10-21',8414.58)
,(427,'2016-10-22',8414.58)
,(427,'2016-10-23',8414.58)
,(427,'2016-10-24',8401.23)
,(427,'2016-10-25',8393.92)
,(427,'2016-10-26',8379.14)
,(427,'2016-10-27',8374.57)
,(427,'2016-10-28',8358.67)
,(427,'2016-10-29',8358.67)
,(427,'2016-10-30',8358.67)
,(427,'2016-10-31',8346.61)
,(541,'2016-10-05',900.44)
,(541,'2016-10-06',832.84)
,(541,'2016-10-11',637.54)
,(541,'2016-10-15',413.89)
,(541,'2016-10-16',413.89)
,(541,'2016-10-17',413.89)
,(541,'2016-10-18',1728.12)
,(541,'2016-10-22',265.27)
,(541,'2016-10-23',265.27)
,(541,'2016-10-24',265.27)
,(541,'2016-10-25',787.10)
,(541,'2016-10-26',1222.29)
Example data for 3 ids in October:
CID HDATE DELAYSUM
----------- ---------- ---------------------------------------
223 2016-10-16 15503.80
223 2016-10-17 15493.82
223 2016-10-18 15489.25
223 2016-10-19 15417.08
427 2016-10-01 10375.89
427 2016-10-02 10375.89
427 2016-10-03 10385.91
427 2016-10-16 8448.57
427 2016-10-17 8443.13
427 2016-10-18 8440.64
427 2016-10-19 8401.31
427 2016-10-20 8411.20
427 2016-10-21 8414.58
427 2016-10-22 8414.58
427 2016-10-23 8414.58
427 2016-10-24 8401.23
427 2016-10-25 8393.92
427 2016-10-26 8379.14
427 2016-10-27 8374.57
427 2016-10-28 8358.67
427 2016-10-29 8358.67
427 2016-10-30 8358.67
427 2016-10-31 8346.61
541 2016-10-05 900.44
541 2016-10-06 832.84
541 2016-10-11 637.54
541 2016-10-15 413.89
541 2016-10-16 413.89
541 2016-10-17 413.89
541 2016-10-18 1728.12
541 2016-10-22 265.27
541 2016-10-23 265.27
541 2016-10-24 265.27
541 2016-10-25 787.10
541 2016-10-26 1222.29
Need output (minimum date for each section of dates in ids (CID) with end date for each period (section)) Segments are separated by 1 or more days:
CID HDATE DELAYSUM END_DATE
----------- ---------- ---------------------------------------
223 2016-10-16 15503.80 2016-10-19
427 2016-10-01 10375.89 2016-10-03
427 2016-10-16 8448.57 2016-10-31
541 2016-10-05 900.44 2016-10-06
541 2016-10-11 637.54 2016-10-11
541 2016-10-15 413.89 2016-10-18
541 2016-10-22 265.27 2016-10-26
Stuck with this task for now. Sorry for my English.
Upvotes: 2
Views: 1848
Reputation: 1270513
One method for solving this is the difference of row numbers:
select cid, min(hdate), max(hdate), min(delaysum)
from (select t.*,
row_number() over (order by hdate) as seqnum,
row_number() over (partition by cid order by hdate) as seqnum_c
from testsum t
) t
group by cid, (seqnum - seqnum_c);
EDIT:
As I look closer, it looks like you want the first value, rather than the minimum value. SQL Server doesn't offer first_value()
(yet) as an aggregation function. So:
select cid, min(hdate), max(hdate), min(first_delaysum)
from (select t.*,
first_value(delaysum) over (partition by cid, seqnum - seqnum_c order by hdate) as first_delaysum
from (select t.*,
row_number() over (order by hdate) as seqnum,
row_number() over (partition by cid order by hdate) as seqnum_c
from testsum t
) t
) t
group by cid, (seqnum - seqnum_c);
Upvotes: 0
Reputation: 49260
The key here is to classify the records per cid into the same group if the date difference between consecutive rows is 1. This query uses a logic dateadd(day,-row_number() over (partition by cid order by hdate),hdate)
to do it. Run the inner query to see how the groups are assigned.
Thereafter, using window functions min
,max
and first_value
you can get the min hdate, max hdate and first value of delay sum per cid using the groups previously assigned.
SELECT DISTINCT cid,
min(hdate) over (partition BY cid, grp) AS hdate,
first_value(delaysum) over (partition BY cid, grp ORDER BY hdate) AS delaysum,
max(hdate) over (partition BY cid, grp) AS end_date
FROM (SELECT t.* ,
dateadd(DAY,-row_number() over (partition BY cid ORDER BY hdate),hdate) AS grp
FROM testsum t ) x
ORDER BY cid,hdate
Upvotes: 1