Reputation: 43
I have the following data set as data.table (dt):
time wea
1 2016-06-16 00:04:50 cloudy
2 2016-06-16 02:14:55 cloudy
3 2016-06-16 03:19:04 cloudy
4 2016-06-16 04:19:24 cloudy
5 2016-06-16 05:23:45 cloudy
6 2016-06-16 06:25:05 cloudy
7 2016-06-16 07:26:46 cloudy
8 2016-06-16 08:27:06 cloudy
9 2016-06-16 09:34:09 sunny
10 2016-06-16 10:36:29 sunny
11 2016-06-16 11:39:09 sunny
12 2016-06-16 12:39:29 sunny
13 2016-06-16 13:44:50 rainy
14 2016-06-16 14:46:31 rainy
15 2016-06-16 15:47:12 rainy
16 2016-06-16 16:48:12 rainy
17 2016-06-16 17:49:54 sunny
18 2016-06-16 18:50:34 sunny
19 2016-06-16 19:51:34 cloudy
20 2016-06-16 20:53:55 cloudy
21 2016-06-16 21:56:51 cloudy
22 2016-06-16 22:30:42 cloudy
23 2016-06-16 23:26:14 cloudy
And I want this:
wea start.point end.point
1: cloudy 1 8
2: sunny 9 12
3: rainy 13 16
4: sunny 17 18
5: cloudy 19 23
I tried running the following, but without success
dt[, .(start.point = head(.I,1), end.point = tail(.I,1)) , by="wea"]
because "by" groups the 2nd 'cloudy' block together with the first 'cloudy' block but I want to separate the consecutive blocks.
Upvotes: 4
Views: 204
Reputation: 83215
You could use the .I
to create an index and rleid
in the by
statement to separate the consequetive parts. Using:
dt[, .(start.point = .I[1], end.point = .I[.N]), by = .(wea, rleid(wea))
][, rleid := NULL][]
gives:
wea start.point end.point
1: cloudy 1 8
2: sunny 9 12
3: rainy 13 16
4: sunny 17 18
5: cloudy 19 23
Upvotes: 5