Reputation: 12253
I have the following columns
VehicleId
,StatusId
,StartTime
,Comments
The data is basically a sequence of changes to VehicleId
,StatusId
with the time they occurred and comments. Each vehicle can have a statusid occur many times in the last few years. The data spans a period of years, but isn't a huge dataset 50000 rows. We only are interested in datetime of each change of status for each vehicle. So that would mean that I have many combinations of vehicle and statusid and the first start date seen in sequence. So a vehicle had a StatusId
in two different sequences I would want the earliest date of each sequence, the VehicleId
and that StatusId
. How can I get that?
I created a sql fiddle here
I looked at RANK() OVER (PARTITION BY [VehicleId],[StatusID] ORDER BY StartTime) AS [Group]
etc but they don't seem to help
Input
VehicleId | StatusId | StartTime
1 | 1 | 2010-1-1
1 | 1 | 2010-1-2
1 | 2 | 2010-1-3
1 | 2 | 2010-1-4
1 | 1 | 2010-1-5
2 | 1 | 2010-1-2
Output
VehicleId | StatusId | FirstStartTime
1 | 1 | 2010-1-1
1 | 2 | 2010-1-3
1 | 1 | 2010-1-5
2 | 1 | 2010-1-2
So I basically need to reduce the output rows to only have the start of each sequence ingoring any changes to the row unless it is a change in VehcileId or StatusId
Upvotes: 3
Views: 53
Reputation: 1269503
You can identify the groups using a difference of row numbers approach. Then aggregation gets what you want:
select vehicleId, statusId, min(StartTime) as FirstStartTime
from (select se.*,
(row_number() over (partition by vehicleId order by StartTime) -
row_number() over (partition by vehicleId, StatusId order by StartTime)
) as grp
from StatusEvents se
) se
group by grp, statusId, vehicleId
order by vehicleId, FirstStartTime;
Upvotes: 3