chrissy p
chrissy p

Reputation: 823

Date bound SQL group by

I have a dataset that looks like this:

StartDate  EndDate    InstrumentID   Dimension   DimensionValue
2018-01-01 2018-01-01 123            Currency    GBP
2018-01-02 2018-01-02 123            Currency    GBP
2018-01-03 2018-01-03 123            Currency    USD
2018-01-04 2018-01-04 123            Currency    USD
2018-01-05 2018-01-05 123            Currency    GBP
2018-01-06 2018-01-06 123            Currency    GBP

What I would like is to transform this dataset into a date bound dataset like below:

StartDate  EndDate    InstrumentID   Dimension   DimensionValue
2018-01-01 2018-01-02 123            Currency    GBP
2018-01-03 2018-01-04 123            Currency    USD
2018-01-05 2018-01-06 123            Currency    GBP

I thought about writing the SQL like this:

SELECT
        MIN(StartDate)      AS StartDate
   ,    MAX(EndDate)        AS EndDate
   ,    [InstrumentID]
   ,    Dimension
   ,    DimensionValue

FROM #Worktable
GROUP BY InstrumentID, Dimension, DimensionValue

However this obviously won't work as it will ignore the change in date for GBP and just group one record together with start date of 2018-01-01 and end date of 2018-01-06.

Is there a way in which I can do this and achieve the dates I require?

Thanks

Upvotes: 0

Views: 225

Answers (5)

Newbie92
Newbie92

Reputation: 27

Try something like the following:

WITH CTE AS(
    SELECT StartDate::DATE AS StartDate,
           EndDate::DATE  AS EndDate,
           InstrumentID,
           Dimension,
           DimensionValue 
    FROM (VALUES('20180101','20180101',123,'Currency','GBP'),
                ('20180102','20180102',123,'Currency','GBP'),
                ('20180103','20180103',123,'Currency','USD'),
                ('20180104','20180104',123,'Currency','USD'),
                ('20180105','20180105',123,'Currency','GBP'),
                ('20180106','20180106',123,'Currency','GBP')) V(StartDate,EndDate,InstrumentID,Dimension,DimensionValue))

SELECT startdate
, enddate
, instrumentid
, dimension
, dimensionvalue
FROM (
SELECT * 
, CASE WHEN (LAG(enddate, 1) OVER(PARTITION BY dimensionvalue ORDER BY startdate) IS NULL) OR (enddate - LAG(enddate, 1) OVER(PARTITION BY dimensionvalue ORDER BY startdate) <> 1) THEN 0 
       ELSE 1 END is_valid
FROM CTE
) a 
WHERE is_valid = 1
ORDER BY startdate;

Credit to @Lamu for creating the temp table.

Upvotes: 0

Thom A
Thom A

Reputation: 95554

This is a common Gaps and Islands question. There are plenty of examples out there on how to do this; for example:

WITH VTE AS(
    SELECT CONVERT(date,StartDate) AS StartDate,
           CONVERT(Date,EndDate) AS EndDate,
           InstrumentID,
           Dimension,
           DimensionValue 
    FROM (VALUES('20180101','20180101',123,'Currency','GBP'),
                ('20180102','20180102',123,'Currency','GBP'),
                ('20180103','20180103',123,'Currency','USD'),
                ('20180104','20180104',123,'Currency','USD'),
                ('20180105','20180105',123,'Currency','GBP'),
                ('20180106','20180106',123,'Currency','GBP')) V(StartDate,EndDate,InstrumentID,Dimension,DimensionValue)),
Grps AS (
    SELECT StartDate,
           EndDate,
           InstrumentID,
           Dimension,
           DimensionValue,
           ROW_NUMBER() OVER (PARTITION BY InstrumentID, Dimension ORDER BY StartDate) - 
           ROW_NUMBER() OVER (PARTITION BY InstrumentID, Dimension, DimensionValue ORDER BY StartDate) AS Grp
    FROM VTE)
SELECT MIN(StartDate) AS StartDate,
       MAX(EndDate) AS EndDate,
       InstrumentID,
       Dimension,
       DimensionValue
FROM Grps
GROUP BY InstrumentID,
         Dimension,
         DimensionValue,
         Grp
ORDER BY StartDate;

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269533

This is a form of gaps-and-islands. But because there are start date and end dates, you need to be careful. I recommend lag() and cumulative sum:

select InstrumentID, Dimension, DimensionValue,
       min(startdate) as startdate, max(enddate) as enddate
from (select w.*,
             sum(case when prev_enddate = startdate then 0 else 1 end) 
                 over (partition by InstrumentID, Dimension, 
                     DimensionValue order by startdate) as grp
      from (select w.*,
                   lag(enddate) over (partition by InstrumentID, Dimension, DimensionValue 
                       order by startdate) as prev_enddate
            from #worktable w
     ) w
group by InstrumentID, Dimension, DimensionValue, grp
order by InstrumentID, Dimension, DimensionValue, min(startdate);

Upvotes: 1

Boris Jovanovic
Boris Jovanovic

Reputation: 132

Update, I just thought of this, I couldn't test it yet, I think it will work the way you want it to.

Select StartDate, EndDate, InstrumentID, Dimension, DimensionValue From (
SELECT
      StartDate      AS StartDate
 ,    EndDate        AS EndDate
 ,    [InstrumentID]
 ,    Dimension     
 ,    DimensionValue
 ,    Count(*)
FROM #Worktable
GROUP BY InstrumentID, StartDate, EndDate, Dimension, DimensionValue) x

Hope this helps!

Upvotes: 0

user10679624
user10679624

Reputation:

You need to use dense rank like:

with x as(
    select DENSE_RANK() OVER   
        (PARTITION BY DimensionValue) AS Rank  , * 
    from Worktable
    ) select StartDate AS StartDate
     ,    EndDate        AS EndDate
     ,    [InstrumentID]
     ,    Max(Dimension)      AS Dimension
     ,    DimensionValue, Rank
    FROM x
    GROUP BY InstrumentID, StartDate, EndDate, DimensionValue,Rank

Upvotes: 0

Related Questions