Reputation: 405
I have this dataframe as written below. The dataframe have a column A as user_ID, B is a criteria for emails where 1 represents the SendSuccess and 2 represents if the email has been read, and C is the binary counterpart for B. The dataframe is sorted by A and B
I want a column D that counts how many times an email has been read by each user, so basically adding the value from C to the previous value for D, but if C is 0 then D is 0 as well if C(1)=0 then D(1)=0 else D(1)=1, if C(2)=0 then D(2)=0 else D(2)=1+D(1), if C(3)=0 then D(3)=0 else D(3)=1+D(2), and so on. Where the (1) (2) represents row numbers
Then I want column E that combines the send email with the first time it has been read. E is calculated by, if D(2)=1 then E(1)=1 else if D(1)=1 then E(1)=1 else 0
And finally I want column F which is just a grouped maximum of how many times that particular mail has been read
DF <- data.frame(A=c(1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4), B=c(1,1,2,2,2,1,1,2,2,1,1,1,2,2,1,1,1,2,1,1), C=c(0,0,1,1,1,0,0,1,1,0,0,0,1,1,0,0,0,1,0,1))
DF
A B C Want_D Want_E Want_F
1: 1 1 0 0 0 0
2: 2 1 0 0 1 0
3: 2 2 1 1 1 3
4: 2 2 1 2 0 3
5: 2 2 1 3 0 3
6: 2 1 0 0 0 0
7: 2 1 0 0 1 0
8: 2 2 1 1 1 2
9: 2 2 1 2 0 2
10: 3 1 0 0 0 0
11: 3 1 0 0 0 0
12: 3 1 0 0 1 0
13: 3 2 1 1 1 2
14: 3 2 1 2 0 2
15: 3 1 0 0 0 0
16: 3 1 0 0 0 0
17: 4 1 0 0 1 0
18: 4 2 1 1 1 1
19: 4 1 0 0 0 0
20: 4 1 0 0 0 0
Upvotes: 0
Views: 114
Reputation: 1610
A solution using a for loop:
DF$D <- c(DF$C[1], rep(0, nrow(DF)-1))
for (i in 2:nrow(DF))
if (DF$C[i] != 0){
DF$D[i] <- DF$D[i-1]+1
}
DF$E <- rep(0, nrow(DF))
DF$E[c(which(DF$D == 1), which(DF$D == 1)-1)] <- 1
x <- rle(DF$C)
x$values <- x$lengths * x$values
DF$F <- rep(x$values, x$lengths)
Upvotes: 0
Reputation: 12819
library(dplyr)
DF %>%
group_by(A) %>%
mutate(email = cumsum(C == 0)) %>%
group_by(A, email) %>%
mutate(
D = cumsum(C),
E = as.numeric(lead(D, default = 0) == 1 | D == 1)
) %>%
group_by(A, email, C) %>%
mutate(`F` = max(D)) %>%
ungroup()
# # A tibble: 20 × 7
# A B C email D E F
# <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 1 1 0 1 0 0 0
# 2 2 1 0 1 0 1 0
# 3 2 2 1 1 1 1 3
# 4 2 2 1 1 2 0 3
# 5 2 2 1 1 3 0 3
# 6 2 1 0 2 0 0 0
# 7 2 1 0 3 0 1 0
# 8 2 2 1 3 1 1 2
# 9 2 2 1 3 2 0 2
# 10 3 1 0 1 0 0 0
# 11 3 1 0 2 0 0 0
# 12 3 1 0 3 0 1 0
# 13 3 2 1 3 1 1 2
# 14 3 2 1 3 2 0 2
# 15 3 1 0 4 0 0 0
# 16 3 1 0 5 0 0 0
# 17 4 1 0 1 0 1 0
# 18 4 2 1 1 1 1 1
# 19 4 1 0 2 0 0 0
# 20 4 1 0 3 0 0 0
Upvotes: 3