John F
John F

Reputation: 1072

Cumulative mean non including the current observation - using cummean and group_by while ignoring NAs

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column up to the prior observation (ie not including the current observation) and not taking into account NAs. I've tried

df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = ifelse(isna, NA, cummean(lag(value))))

but cummean just doesn't know what to do with NAs and unfortunately lag generates them.

I do not want to count NAs as 0.

Upvotes: 2

Views: 345

Answers (1)

MKR
MKR

Reputation: 20095

One can workout first cummean and then take lag of the same.

library(dplyr)
df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = lag(cummean(value))) %>%
  ungroup() %>%
  select(-isna)


# # A tibble: 8 x 3
# category value new_col
# <fctr>   <dbl>   <dbl>
# 1 cat1     NA      NA   
# 2 cat1      2.00   NA   
# 3 cat2      3.00   NA   
# 4 cat1      4.00    2.00
# 5 cat2      5.00    3.00
# 6 cat2     NA      NA   
# 7 cat1      7.00    3.00
# 8 cat2      8.00    4.00

Upvotes: 3

Related Questions