Reputation: 10199
Let's say I have 3 columns. the first column is user
which data should be group by it. each user
can have several session
. I have action
column with some values and NA
s which I want to fill based on session
and ``user:
for each user, fill the action
column with its value until either of two conditions:
1-keep filling until reach session+1
number. This means that if action == A and session==2
fill all NA
values with A
until session
4 which includes session
2 and 3.
OR
2- until reach new action
value within session+1
. In this case, the new value starts to fill until its session +1
df<-read.table(text="
user session action
1 1 NA
1 1 A
1 1 NA
1 1 B
1 2 NA
1 2 NA
1 3 NA
2 1 AA
2 1 NA
2 1 NA
2 2 NA
2 3 NA
2 4 AA
2 5 NA
2 6 NA
2 7 AA
2 8 NA",header=T, stringsAsFactors = FALSE)
result: (I highlighted the affected rows)
user session action
1 1 NA
1 1 A
1 1 A <--
1 1 B
1 2 B <--
1 2 B <--
1 3 NA
2 1 AA
2 1 AA <--
2 1 AA <--
2 2 AA <--
2 3 NA
2 4 AA
2 5 AA <--
2 6 NA
2 7 AA
2 8 AA <--
Upvotes: 0
Views: 1199
Reputation: 389355
Here's an inverse approach. We fill
all the action
values first for each user
and then change those values to NA
where session
count is greater than 2 for each action
.
library(dplyr)
df %>%
group_by(user) %>%
mutate(grp = cumsum(!is.na(action))) %>%
tidyr::fill(action) %>%
group_by(grp, add = TRUE) %>%
mutate(temp = replace(action, cumsum(!duplicated(session)) > 2, NA)) %>%
ungroup() %>%
select(-grp)
# user session action temp
# <int> <int> <chr> <chr>
# 1 1 1 NA NA
# 2 1 1 A A
# 3 1 1 A A
# 4 1 1 B B
# 5 1 2 B B
# 6 1 2 B B
# 7 1 3 B NA
# 8 2 1 AA AA
# 9 2 1 AA AA
#10 2 1 AA AA
#11 2 2 AA AA
#12 2 3 AA NA
#13 2 4 AA AA
#14 2 5 AA AA
#15 2 6 AA NA
#16 2 7 AA AA
#17 2 8 AA AA
Upvotes: 2