Identify groups of following dates in a dataframe

Question

I've a df like this one:

    mydate daynight myvar
195 2018-11-21  1   64
196 2018-11-21  2    4
197 2018-11-22  1    7
198 2018-11-22  2    2
199 2018-11-23  2    0
200 2018-11-24  2    0
201 2018-11-25  2   64
202 2018-11-26  2    6
203 2018-11-27  2    0
204 2018-11-28  2    0
205 2018-11-29  2   20
206 2018-11-30  2    4
207 2018-12-01  1   64
214 2018-12-04  2    1
215 2018-12-05  1    4
216 2018-12-05  2    0
217 2018-12-08  2   30
218 2018-12-09  2   15
219 2018-12-10  2    0
220 2018-12-11  2   14
221 2018-12-12  1   28

I'm trying to identify groups of at least 3 following 2 in the daynight column and I want to know the beginning and end date of these groups. With that df it would be :

     beg          end         
[1,] "2018-11-22" "2018-11-30"
[2,] "2018-12-05" "2018-12-11"

AntoniosK · Accepted Answer

Example data:

df = read.table(text = "
mydate daynight myvar
195 2018-11-21  1   64
196 2018-11-21  2    4
197 2018-11-22  1    7
198 2018-11-22  2    2
199 2018-11-23  2    0
200 2018-11-24  2    0
201 2018-11-25  2   64
202 2018-11-26  2    6
203 2018-11-27  2    0
204 2018-11-28  2    0
205 2018-11-29  2   20
206 2018-11-30  2    4
207 2018-12-01  1   64
214 2018-12-04  2    1
215 2018-12-05  1    4
216 2018-12-05  2    0
217 2018-12-08  2   30
218 2018-12-09  2   15
219 2018-12-10  2    0
220 2018-12-11  2   14
221 2018-12-12  1   28
", header=T)

One approach is this:

library(dplyr)
library(data.table)

  df %>%
  group_by(group = rleid(daynight)) %>%  # group by consequtive daynight values
  summarise(val = unique(daynight),      # get the daynight value
            Count = n(),                 # count rows of that value
            Beg = first(mydate),         # get first date
            End = last(mydate)) %>%      # get last date
  filter(Count > 2 & val == 2) %>%       # keep only cases where you have 3+ daylight = 2
  select(-group, -val)                   # remove unecessary columns

# # A tibble: 2 x 3
#    Count Beg        End       
#                
# 1     9 2018-11-22 2018-11-30
# 2     5 2018-12-05 2018-12-11

Note that this approach assumes that your date column is already ordered (like in your example).

Identify groups of following dates in a dataframe

Answers (2)

Related Questions