Reputation: 3
I have a dataframe as below:
Id DateTIme Status
1: 101 10/01/2014 09:32 On
2: 101 10/01/2014 10:50 On
3: 101 10/01/2014 21:32 Off
4: 101 11/01/2014 15:32 Off
5: 101 11/01/2014 21:21 Off
6: 127 10/01/2014 10:13 Off
7: 127 11/01/2014 20:21 Off
8: 127 11/01/2014 23:10 Off
9: 127 12/01/2014 12:02 Off
10: 127 12/01/2014 21:00 On
11: 127 13/01/2014 03:24 On
12: 763 11/01/2014 12:01 Off
13: 763 11/01/2014 22:10 Off
14: 763 12/01/2014 09:32 On
15: 763 13/01/2014 09:21 On
16: 763 13/01/2014 20:23 On
17: 763 14/01/2014 15:12 On
18: 763 14/01/2014 23:51 Off
19: 763 15/01/2014 09:23 Off
The dataframe is order by Id and DateTime.
I need to find the initial and end time for each Status change, for each Id. So, in this case I would expect to output something like this:
Id Status InitialTime EndTime
1: 101 On 10/01/2014 09:32 10/01/2014 10:50
2: 101 Off 10/01/2014 21:32 11/01/2014 21:21
3: 127 Off 10/01/2014 10:13 12/01/2014 12:02
4: 127 On 12/01/2014 21:00 13/01/2014 03:24
5: 763 Off 11/01/2014 12:01 11/01/2014 22:10
6: 763 On 12/01/2014 09:32 14/01/2014 15:12
7: 763 Off 14/01/2014 23:51 15/01/2014 09:23
Upvotes: 0
Views: 54
Reputation: 887951
It seems like the OP's data is already a data.table
. In case, it is not, convert to 'data.table' (setDT(df1)
), grouped by 'Id', 'Status' and the run-length-id of 'Status', we get the first 'DateTIme' and last 'DateTIme' to summarise the dataset to have 'InitialTime' and 'EndTime' columns (respectively)
library(data.table)
setDT(df1)[, .(InitialTime = DateTIme[1L], EndTime=DateTIme[.N]) ,
.(Id, Status, Status1 = rleid(Status))][, Status1 := NULL][]
# Id Status InitialTime EndTime
#1: 101 On 10/01/2014 09:32 10/01/2014 10:50
#2: 101 Off 10/01/2014 21:32 11/01/2014 21:21
#3: 127 Off 10/01/2014 10:13 12/01/2014 12:02
#4: 127 On 12/01/2014 21:00 13/01/2014 03:24
#5: 763 Off 11/01/2014 12:01 11/01/2014 22:10
#6: 763 On 12/01/2014 09:32 14/01/2014 15:12
#7: 763 Off 14/01/2014 23:51 15/01/2014 09:23
Upvotes: 1