Reputation: 137
I have an R dataframe like the one below which contains each user and a list of days they were active.
User date_active
1 2018-08-27
1 2018-08-28
1 2018-08-29
1 2018-08-31
1 2018-09-01
2 2018-07-05
2 2018-07-06
2 2018-07-07
2 2018-07-08
2 2018-07-10
2 2018-07-11
2 2018-07-12
2 2018-07-20
I would like to count all the dates up until a gap, in which the count restarts. My expected output would look like the below table for the above data.
User date_active
1 3
1 2
2 4
2 3
2 1
My initial guess was to use the rank
function or difftime
but am unsure how to get the result. Any help would be most appreciated!
Upvotes: 2
Views: 413
Reputation: 388907
With dplyr
we could group_by
User
and number of days when difference of days is not equal to 1. We count the number of rows in each group then.
library(dplyr)
df %>%
group_by(User, days = c(0, cumsum(diff(date_active) != 1))) %>%
summarise(date_active = n()) %>%
select(-days)
# User date_active
# <int> <int>
#1 1 3
#2 1 2
#3 2 4
#4 2 3
#5 2 1
This is assuming that your date_active
is of class Date
or else convert it to Date
first by
df$date_active <- as.Date(df$date_active)
Creating a grouping variable is the key here. See
c(0, cumsum(diff(df$date_active) != 1))
# [1] 0 0 0 1 1 2 2 2 2 3 3 3 4
Upvotes: 6