Reputation: 823
I have a dataset that looks like this:
StartDate EndDate InstrumentID Dimension DimensionValue
2018-01-01 2018-01-01 123 Currency GBP
2018-01-02 2018-01-02 123 Currency GBP
2018-01-03 2018-01-03 123 Currency USD
2018-01-04 2018-01-04 123 Currency USD
2018-01-05 2018-01-05 123 Currency GBP
2018-01-06 2018-01-06 123 Currency GBP
What I would like is to transform this dataset into a date bound dataset like below:
StartDate EndDate InstrumentID Dimension DimensionValue
2018-01-01 2018-01-02 123 Currency GBP
2018-01-03 2018-01-04 123 Currency USD
2018-01-05 2018-01-06 123 Currency GBP
I thought about writing the SQL like this:
SELECT
MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
, [InstrumentID]
, Dimension
, DimensionValue
FROM #Worktable
GROUP BY InstrumentID, Dimension, DimensionValue
However this obviously won't work as it will ignore the change in date for GBP and just group one record together with start date of 2018-01-01 and end date of 2018-01-06.
Is there a way in which I can do this and achieve the dates I require?
Thanks
Upvotes: 0
Views: 225
Reputation: 27
Try something like the following:
WITH CTE AS(
SELECT StartDate::DATE AS StartDate,
EndDate::DATE AS EndDate,
InstrumentID,
Dimension,
DimensionValue
FROM (VALUES('20180101','20180101',123,'Currency','GBP'),
('20180102','20180102',123,'Currency','GBP'),
('20180103','20180103',123,'Currency','USD'),
('20180104','20180104',123,'Currency','USD'),
('20180105','20180105',123,'Currency','GBP'),
('20180106','20180106',123,'Currency','GBP')) V(StartDate,EndDate,InstrumentID,Dimension,DimensionValue))
SELECT startdate
, enddate
, instrumentid
, dimension
, dimensionvalue
FROM (
SELECT *
, CASE WHEN (LAG(enddate, 1) OVER(PARTITION BY dimensionvalue ORDER BY startdate) IS NULL) OR (enddate - LAG(enddate, 1) OVER(PARTITION BY dimensionvalue ORDER BY startdate) <> 1) THEN 0
ELSE 1 END is_valid
FROM CTE
) a
WHERE is_valid = 1
ORDER BY startdate;
Credit to @Lamu for creating the temp table.
Upvotes: 0
Reputation: 95554
This is a common Gaps and Islands question. There are plenty of examples out there on how to do this; for example:
WITH VTE AS(
SELECT CONVERT(date,StartDate) AS StartDate,
CONVERT(Date,EndDate) AS EndDate,
InstrumentID,
Dimension,
DimensionValue
FROM (VALUES('20180101','20180101',123,'Currency','GBP'),
('20180102','20180102',123,'Currency','GBP'),
('20180103','20180103',123,'Currency','USD'),
('20180104','20180104',123,'Currency','USD'),
('20180105','20180105',123,'Currency','GBP'),
('20180106','20180106',123,'Currency','GBP')) V(StartDate,EndDate,InstrumentID,Dimension,DimensionValue)),
Grps AS (
SELECT StartDate,
EndDate,
InstrumentID,
Dimension,
DimensionValue,
ROW_NUMBER() OVER (PARTITION BY InstrumentID, Dimension ORDER BY StartDate) -
ROW_NUMBER() OVER (PARTITION BY InstrumentID, Dimension, DimensionValue ORDER BY StartDate) AS Grp
FROM VTE)
SELECT MIN(StartDate) AS StartDate,
MAX(EndDate) AS EndDate,
InstrumentID,
Dimension,
DimensionValue
FROM Grps
GROUP BY InstrumentID,
Dimension,
DimensionValue,
Grp
ORDER BY StartDate;
Upvotes: 2
Reputation: 1269533
This is a form of gaps-and-islands. But because there are start date and end dates, you need to be careful. I recommend lag()
and cumulative sum:
select InstrumentID, Dimension, DimensionValue,
min(startdate) as startdate, max(enddate) as enddate
from (select w.*,
sum(case when prev_enddate = startdate then 0 else 1 end)
over (partition by InstrumentID, Dimension,
DimensionValue order by startdate) as grp
from (select w.*,
lag(enddate) over (partition by InstrumentID, Dimension, DimensionValue
order by startdate) as prev_enddate
from #worktable w
) w
group by InstrumentID, Dimension, DimensionValue, grp
order by InstrumentID, Dimension, DimensionValue, min(startdate);
Upvotes: 1
Reputation: 132
Update, I just thought of this, I couldn't test it yet, I think it will work the way you want it to.
Select StartDate, EndDate, InstrumentID, Dimension, DimensionValue From (
SELECT
StartDate AS StartDate
, EndDate AS EndDate
, [InstrumentID]
, Dimension
, DimensionValue
, Count(*)
FROM #Worktable
GROUP BY InstrumentID, StartDate, EndDate, Dimension, DimensionValue) x
Hope this helps!
Upvotes: 0
Reputation:
You need to use dense rank like:
with x as(
select DENSE_RANK() OVER
(PARTITION BY DimensionValue) AS Rank , *
from Worktable
) select StartDate AS StartDate
, EndDate AS EndDate
, [InstrumentID]
, Max(Dimension) AS Dimension
, DimensionValue, Rank
FROM x
GROUP BY InstrumentID, StartDate, EndDate, DimensionValue,Rank
Upvotes: 0