KhalidN
KhalidN

Reputation: 405

previous value calculations in a dataframe in r

I have this dataframe as written below. The dataframe have a column A as user_ID, B is a criteria for emails where 1 represents the SendSuccess and 2 represents if the email has been read, and C is the binary counterpart for B. The dataframe is sorted by A and B

I want a column D that counts how many times an email has been read by each user, so basically adding the value from C to the previous value for D, but if C is 0 then D is 0 as well if C(1)=0 then D(1)=0 else D(1)=1, if C(2)=0 then D(2)=0 else D(2)=1+D(1), if C(3)=0 then D(3)=0 else D(3)=1+D(2), and so on. Where the (1) (2) represents row numbers

Then I want column E that combines the send email with the first time it has been read. E is calculated by, if D(2)=1 then E(1)=1 else if D(1)=1 then E(1)=1 else 0

And finally I want column F which is just a grouped maximum of how many times that particular mail has been read

DF <- data.frame(A=c(1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4), B=c(1,1,2,2,2,1,1,2,2,1,1,1,2,2,1,1,1,2,1,1), C=c(0,0,1,1,1,0,0,1,1,0,0,0,1,1,0,0,0,1,0,1))
DF
    A  B   C   Want_D  Want_E  Want_F
 1: 1  1   0        0       0       0
 2: 2  1   0        0       1       0
 3: 2  2   1        1       1       3
 4: 2  2   1        2       0       3
 5: 2  2   1        3       0       3
 6: 2  1   0        0       0       0
 7: 2  1   0        0       1       0
 8: 2  2   1        1       1       2
 9: 2  2   1        2       0       2
10: 3  1   0        0       0       0
11: 3  1   0        0       0       0
12: 3  1   0        0       1       0
13: 3  2   1        1       1       2
14: 3  2   1        2       0       2
15: 3  1   0        0       0       0
16: 3  1   0        0       0       0
17: 4  1   0        0       1       0
18: 4  2   1        1       1       1
19: 4  1   0        0       0       0
20: 4  1   0        0       0       0

Upvotes: 0

Views: 114

Answers (2)

Paulo MiraMor
Paulo MiraMor

Reputation: 1610

A solution using a for loop:

DF$D <- c(DF$C[1], rep(0, nrow(DF)-1))
for (i in 2:nrow(DF))
  if (DF$C[i] != 0){
  DF$D[i] <- DF$D[i-1]+1
    }

DF$E <- rep(0, nrow(DF))
DF$E[c(which(DF$D == 1), which(DF$D == 1)-1)] <- 1

x <- rle(DF$C)
x$values <- x$lengths * x$values
DF$F <- rep(x$values, x$lengths)

Upvotes: 0

Aur&#232;le
Aur&#232;le

Reputation: 12819

library(dplyr)
DF %>% 
  group_by(A) %>% 
  mutate(email = cumsum(C == 0)) %>% 
  group_by(A, email) %>%
  mutate(
    D = cumsum(C), 
    E = as.numeric(lead(D, default = 0) == 1 | D == 1)
  ) %>% 
  group_by(A, email, C) %>% 
  mutate(`F` = max(D)) %>% 
  ungroup()
# # A tibble: 20 × 7
#        A     B     C email     D     E     F
#    <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1      1     1     0     1     0     0     0
# 2      2     1     0     1     0     1     0
# 3      2     2     1     1     1     1     3
# 4      2     2     1     1     2     0     3
# 5      2     2     1     1     3     0     3
# 6      2     1     0     2     0     0     0
# 7      2     1     0     3     0     1     0
# 8      2     2     1     3     1     1     2
# 9      2     2     1     3     2     0     2
# 10     3     1     0     1     0     0     0
# 11     3     1     0     2     0     0     0
# 12     3     1     0     3     0     1     0
# 13     3     2     1     3     1     1     2
# 14     3     2     1     3     2     0     2
# 15     3     1     0     4     0     0     0
# 16     3     1     0     5     0     0     0
# 17     4     1     0     1     0     1     0
# 18     4     2     1     1     1     1     1
# 19     4     1     0     2     0     0     0
# 20     4     1     0     3     0     0     0

Upvotes: 3

Related Questions