Reputation: 71
I've a df like this one:
mydate daynight myvar
195 2018-11-21 1 64
196 2018-11-21 2 4
197 2018-11-22 1 7
198 2018-11-22 2 2
199 2018-11-23 2 0
200 2018-11-24 2 0
201 2018-11-25 2 64
202 2018-11-26 2 6
203 2018-11-27 2 0
204 2018-11-28 2 0
205 2018-11-29 2 20
206 2018-11-30 2 4
207 2018-12-01 1 64
214 2018-12-04 2 1
215 2018-12-05 1 4
216 2018-12-05 2 0
217 2018-12-08 2 30
218 2018-12-09 2 15
219 2018-12-10 2 0
220 2018-12-11 2 14
221 2018-12-12 1 28
I'm trying to identify groups of at least 3 following 2
in the daynight
column and I want to know the beginning and end date of these groups. With that df it would be :
beg end
[1,] "2018-11-22" "2018-11-30"
[2,] "2018-12-05" "2018-12-11"
Upvotes: 0
Views: 28
Reputation: 51592
Here is a base R idea,
i1 <- cumsum(c(TRUE, diff(d2$daynight) != 0))
do.call(rbind,
Filter(function(x)!is.null(x),
tapply(d2$mydate, i1, FUN = function(i)if(length(i) >= 3){c(i[1], i[length(i)])})))
which gives,
[,1] [,2] 4 "2018-11-22" "2018-11-30" 8 "2018-12-05" "2018-12-11"
DATA
dput(d2)
structure(list(mydate = c("2018-11-21", "2018-11-21", "2018-11-22",
"2018-11-22", "2018-11-23", "2018-11-24", "2018-11-25", "2018-11-26",
"2018-11-27", "2018-11-28", "2018-11-29", "2018-11-30", "2018-12-01",
"2018-12-04", "2018-12-05", "2018-12-05", "2018-12-08", "2018-12-09",
"2018-12-10", "2018-12-11", "2018-12-12"), daynight = c(1L, 2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,
2L, 2L, 1L), myvar = c(64L, 4L, 7L, 2L, 0L, 0L, 64L, 6L, 0L,
0L, 20L, 4L, 64L, 1L, 4L, 0L, 30L, 15L, 0L, 14L, 28L)), row.names = c("195",
"196", "197", "198", "199", "200", "201", "202", "203", "204",
"205", "206", "207", "214", "215", "216", "217", "218", "219",
"220", "221"), class = "data.frame")
Upvotes: 0
Reputation: 16121
Example data:
df = read.table(text = "
mydate daynight myvar
195 2018-11-21 1 64
196 2018-11-21 2 4
197 2018-11-22 1 7
198 2018-11-22 2 2
199 2018-11-23 2 0
200 2018-11-24 2 0
201 2018-11-25 2 64
202 2018-11-26 2 6
203 2018-11-27 2 0
204 2018-11-28 2 0
205 2018-11-29 2 20
206 2018-11-30 2 4
207 2018-12-01 1 64
214 2018-12-04 2 1
215 2018-12-05 1 4
216 2018-12-05 2 0
217 2018-12-08 2 30
218 2018-12-09 2 15
219 2018-12-10 2 0
220 2018-12-11 2 14
221 2018-12-12 1 28
", header=T)
One approach is this:
library(dplyr)
library(data.table)
df %>%
group_by(group = rleid(daynight)) %>% # group by consequtive daynight values
summarise(val = unique(daynight), # get the daynight value
Count = n(), # count rows of that value
Beg = first(mydate), # get first date
End = last(mydate)) %>% # get last date
filter(Count > 2 & val == 2) %>% # keep only cases where you have 3+ daylight = 2
select(-group, -val) # remove unecessary columns
# # A tibble: 2 x 3
# Count Beg End
# <int> <fct> <fct>
# 1 9 2018-11-22 2018-11-30
# 2 5 2018-12-05 2018-12-11
Note that this approach assumes that your date column is already ordered (like in your example).
Upvotes: 2