Reputation: 561
I want to code the number of days elapsed since the users last activity for a churn analysis.
I have tried a code I have found in a related topic but it does not work:
da = da %>%
arrange(dayid) %>%
group_by(dayid) %>%
mutate(dayssincelastactivity = c(NA, diff(dayid))
Lets say this is the data. active indicates if the user was active on this day. I want to add the variable dayssincelastactivity, that indicates the number of days elapsed since a user's last active day.
da <- data.frame(dayid = c(1,2,3,4,5,6,7,8), active = c(1,1,0,0,0,1,1,1), dayssincelastactivity = c(1,1,2,3,4,1,1,1))
da
dayid active dayssincelastactivity
1 1 1 1
2 2 1 1
3 3 0 2
4 4 0 3
5 5 0 4
6 6 1 1
7 7 1 1
8 8 1 1
Upvotes: 1
Views: 42
Reputation: 389155
Create a grouping variable using cumsum
and seq_along
each group.
with(da, ave(dayid, cumsum(active == 1), FUN = seq_along))
#[1] 1 1 2 3 4 1 1 1
You can also translate this to dplyr
library(dplyr)
da %>%
group_by(group = cumsum(active == 1)) %>%
mutate(new_val = row_number()) %>%
ungroup() %>%
select(-group)
# dayid active dayssincelastactivity new_val
# <dbl> <dbl> <dbl> <int>
#1 1 1 1 1
#2 2 1 1 1
#3 3 0 2 2
#4 4 0 3 3
#5 5 0 4 4
#6 6 1 1 1
#7 7 1 1 1
#8 8 1 1 1
Upvotes: 1