Reputation: 1072
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
I'd like to add a new column to the above dataframe which takes the cumulative mean of the value
column up to the prior observation (ie not including the current observation) and not taking into account NAs. I've tried
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = ifelse(isna, NA, cummean(lag(value))))
but cummean
just doesn't know what to do with NAs and unfortunately lag
generates them.
I do not want to count NAs as 0.
Upvotes: 2
Views: 345
Reputation: 20095
One can workout first cummean
and then take lag
of the same.
library(dplyr)
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = lag(cummean(value))) %>%
ungroup() %>%
select(-isna)
# # A tibble: 8 x 3
# category value new_col
# <fctr> <dbl> <dbl>
# 1 cat1 NA NA
# 2 cat1 2.00 NA
# 3 cat2 3.00 NA
# 4 cat1 4.00 2.00
# 5 cat2 5.00 3.00
# 6 cat2 NA NA
# 7 cat1 7.00 3.00
# 8 cat2 8.00 4.00
Upvotes: 3