A. Anatolii
A. Anatolii

Reputation: 37

How to get minimum date by each section for same id different times? - SQL Server

I use SQL Server 2012. I have some table with daily history for delay sum, something like this:

SET DATEFORMAT YMD
GO
    CREATE TABLE [dbo].[testsum](
        [CID] [int],
        [HDATE] [date],
        [DELAYSUM] [numeric](16, 2)
)
GO
INSERT [dbo].[testsum] ([CID], [HDATE], [DELAYSUM]) VALUES
        (223,'2016-10-16',15503.80)
        ,(223,'2016-10-17',15493.82)
        ,(223,'2016-10-18',15489.25)
        ,(223,'2016-10-19',15417.08)
        ,(427,'2016-10-01',10375.89)
        ,(427,'2016-10-02',10375.89)
        ,(427,'2016-10-03',10385.91)
        ,(427,'2016-10-16',8448.57)
        ,(427,'2016-10-17',8443.13)
        ,(427,'2016-10-18',8440.64)
        ,(427,'2016-10-19',8401.31)
        ,(427,'2016-10-20',8411.20)
        ,(427,'2016-10-21',8414.58)
        ,(427,'2016-10-22',8414.58)
        ,(427,'2016-10-23',8414.58)
        ,(427,'2016-10-24',8401.23)
        ,(427,'2016-10-25',8393.92)
        ,(427,'2016-10-26',8379.14)
        ,(427,'2016-10-27',8374.57)
        ,(427,'2016-10-28',8358.67)
        ,(427,'2016-10-29',8358.67)
        ,(427,'2016-10-30',8358.67)
        ,(427,'2016-10-31',8346.61)
        ,(541,'2016-10-05',900.44)
        ,(541,'2016-10-06',832.84)
        ,(541,'2016-10-11',637.54)
        ,(541,'2016-10-15',413.89)
        ,(541,'2016-10-16',413.89)
        ,(541,'2016-10-17',413.89)
        ,(541,'2016-10-18',1728.12)
        ,(541,'2016-10-22',265.27)
        ,(541,'2016-10-23',265.27)
        ,(541,'2016-10-24',265.27)
        ,(541,'2016-10-25',787.10)
        ,(541,'2016-10-26',1222.29)

Example data for 3 ids in October:

    CID         HDATE      DELAYSUM
    ----------- ---------- ---------------------------------------
    223       2016-10-16   15503.80
    223       2016-10-17   15493.82
    223       2016-10-18   15489.25
    223       2016-10-19   15417.08
    427       2016-10-01   10375.89
    427       2016-10-02   10375.89
    427       2016-10-03   10385.91
    427       2016-10-16   8448.57
    427       2016-10-17   8443.13
    427       2016-10-18   8440.64
    427       2016-10-19   8401.31
    427       2016-10-20   8411.20
    427       2016-10-21   8414.58
    427       2016-10-22   8414.58
    427       2016-10-23   8414.58
    427       2016-10-24   8401.23
    427       2016-10-25   8393.92
    427       2016-10-26   8379.14
    427       2016-10-27   8374.57
    427       2016-10-28   8358.67
    427       2016-10-29   8358.67
    427       2016-10-30   8358.67
    427       2016-10-31   8346.61
    541       2016-10-05   900.44
    541       2016-10-06   832.84
    541       2016-10-11   637.54
    541       2016-10-15   413.89
    541       2016-10-16   413.89
    541       2016-10-17   413.89
    541       2016-10-18   1728.12
    541       2016-10-22   265.27
    541       2016-10-23   265.27
    541       2016-10-24   265.27
    541       2016-10-25   787.10
    541       2016-10-26   1222.29

Need output (minimum date for each section of dates in ids (CID) with end date for each period (section)) Segments are separated by 1 or more days:

    CID         HDATE      DELAYSUM    END_DATE
    ----------- ---------- ---------------------------------------
    223       2016-10-16   15503.80    2016-10-19
    427       2016-10-01   10375.89    2016-10-03
    427       2016-10-16   8448.57     2016-10-31
    541       2016-10-05   900.44      2016-10-06
    541       2016-10-11   637.54      2016-10-11
    541       2016-10-15   413.89      2016-10-18
    541       2016-10-22   265.27      2016-10-26

Stuck with this task for now. Sorry for my English.

Upvotes: 2

Views: 1848

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270513

One method for solving this is the difference of row numbers:

select cid, min(hdate), max(hdate), min(delaysum)
from (select t.*,
             row_number() over (order by hdate) as seqnum,
             row_number() over (partition by cid order by hdate) as seqnum_c  
      from testsum t
     ) t
group by cid, (seqnum - seqnum_c);

EDIT:

As I look closer, it looks like you want the first value, rather than the minimum value. SQL Server doesn't offer first_value() (yet) as an aggregation function. So:

select cid, min(hdate), max(hdate), min(first_delaysum)
from (select t.*,
             first_value(delaysum) over (partition by cid, seqnum - seqnum_c order by hdate) as first_delaysum
      from (select t.*,
                   row_number() over (order by hdate) as seqnum,
                   row_number() over (partition by cid order by hdate) as seqnum_c   
            from testsum t
           ) t
     ) t
group by cid, (seqnum - seqnum_c);                   

Upvotes: 0

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

The key here is to classify the records per cid into the same group if the date difference between consecutive rows is 1. This query uses a logic dateadd(day,-row_number() over (partition by cid order by hdate),hdate) to do it. Run the inner query to see how the groups are assigned.

Thereafter, using window functions min,max and first_value you can get the min hdate, max hdate and first value of delay sum per cid using the groups previously assigned.

SELECT DISTINCT cid,
                min(hdate) over (partition BY cid, grp) AS hdate,
                first_value(delaysum) over (partition BY cid, grp ORDER BY hdate) AS delaysum,
                max(hdate) over (partition BY cid, grp) AS end_date
FROM (SELECT t.* ,
      dateadd(DAY,-row_number() over (partition BY cid ORDER BY hdate),hdate) AS grp
      FROM testsum t ) x
ORDER BY cid,hdate

Sample Demo

Upvotes: 1

Related Questions