Cumulative mean non including the current observation - using cummean and group_by while ignoring NAs

Question

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column up to the prior observation (ie not including the current observation) and not taking into account NAs. I've tried

df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = ifelse(isna, NA, cummean(lag(value))))

but cummean just doesn't know what to do with NAs and unfortunately lag generates them.

I do not want to count NAs as 0.

MKR · Accepted Answer

One can workout first cummean and then take lag of the same.

library(dplyr)
df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = lag(cummean(value))) %>%
  ungroup() %>%
  select(-isna)


# # A tibble: 8 x 3
# category value new_col
#       
# 1 cat1     NA      NA   
# 2 cat1      2.00   NA   
# 3 cat2      3.00   NA   
# 4 cat1      4.00    2.00
# 5 cat2      5.00    3.00
# 6 cat2     NA      NA   
# 7 cat1      7.00    3.00
# 8 cat2      8.00    4.00

Cumulative mean non including the current observation - using cummean and group_by while ignoring NAs

Answers (1)

Related Questions