Cina
Cina

Reputation: 10199

fill a column based on another column with the respect to value in the row and next rows in R

Let's say I have 3 columns. the first column is user which data should be group by it. each user can have several session. I have action column with some values and NAs which I want to fill based on session and ``user:

for each user, fill the action column with its value until either of two conditions:

1-keep filling until reach session+1 number. This means that if action == A and session==2 fill all NA values with A until session 4 which includes session 2 and 3.

OR

2- until reach new action value within session+1. In this case, the new value starts to fill until its session +1

df<-read.table(text="
user    session    action
1          1        NA
1          1        A
1          1        NA
1          1        B
1          2        NA
1          2        NA
1          3        NA  
2          1        AA
2          1        NA
2          1        NA
2          2        NA
2          3        NA
2          4        AA
2          5        NA
2          6        NA
2          7        AA
2          8        NA",header=T, stringsAsFactors = FALSE)

result: (I highlighted the affected rows)

user    session    action
    1          1        NA
    1          1        A
    1          1        A  <--
    1          1        B
    1          2        B  <--
    1          2        B  <--
    1          3        NA  
    2          1        AA
    2          1        AA <--
    2          1        AA <--
    2          2        AA <--
    2          3        NA
    2          4        AA
    2          5        AA <--
    2          6        NA
    2          7        AA
    2          8        AA <--

Upvotes: 0

Views: 1199

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389355

Here's an inverse approach. We fill all the action values first for each user and then change those values to NA where session count is greater than 2 for each action.

library(dplyr)

df %>%
  group_by(user) %>%
  mutate(grp = cumsum(!is.na(action))) %>%
  tidyr::fill(action) %>%
  group_by(grp, add = TRUE) %>%
  mutate(temp = replace(action, cumsum(!duplicated(session)) > 2, NA)) %>%
  ungroup() %>%
  select(-grp)

#    user session action temp 
#   <int>   <int> <chr>  <chr>
# 1     1       1 NA     NA   
# 2     1       1 A      A    
# 3     1       1 A      A    
# 4     1       1 B      B    
# 5     1       2 B      B    
# 6     1       2 B      B    
# 7     1       3 B      NA   
# 8     2       1 AA     AA   
# 9     2       1 AA     AA   
#10     2       1 AA     AA   
#11     2       2 AA     AA   
#12     2       3 AA     NA   
#13     2       4 AA     AA   
#14     2       5 AA     AA   
#15     2       6 AA     NA   
#16     2       7 AA     AA   
#17     2       8 AA     AA   

Upvotes: 2

Related Questions