chriswang123456
chriswang123456

Reputation: 501

Calculate Mean for Each Unique Value up to a certain date

Data for my example.

date1 = seq(as.Date("2019/01/01"), by = "month", length.out = 48)
date2 = seq(as.Date("2019/02/01"), by = "month", length.out = 48)
date3 = seq(as.Date("2019/02/01"), by = "month", length.out = 48)
date4 = seq(as.Date("2019/02/01"), by = "month", length.out = 48)
date = c(date1,date2,date3,date4)



subproducts1=rep("1",48)
subproducts2=rep("2",48)
subproductsx=rep("x",48)
subproductsy=rep("y",48)

b1 <- c(rnorm(48,5))
b2 <- c(rnorm(48,5))
b3 <-c(rnorm(48,5) )
b4 <- c(rnorm(48,5))

dfone <- data.frame(
                "date"= date,
               
                "subproduct"= 
                  c(subproducts1,subproducts2,subproductsx,subproductsy),
                "actuals"= c(b1,b2,b3,b4))

This creates Jan 2019 for date2,3,4 with value 0.

 dfone <-dfone %>%
 complete(date = seq.Date(from = min(date), to = as.Date('2021-06-01'), by = 'month'), 
       nesting(subproduct), fill = list(actuals = 0))

QUESTION: This calculates the mean for each unique sub product and replaces 0's with the mean of each, but how do I have a hard cutoff so the mean is only based off Jan-2019 to Dec-2020 and not Jan 2019 to Dec 2022?

library(dplyr)
dfone_new <- dfone %>%
     group_by(subproduct)  %>%
     mutate(actuals = replace(actuals, actuals == 0, 
         mean(actuals[actuals != 0], na.rm = TRUE))) %>%
     ungroup

Upvotes: 1

Views: 64

Answers (1)

akrun
akrun

Reputation: 887213

We may need one more logical expression while subsetting the 'actuals' i.e. the 'date' should be between the 2019 Jan and 2020 Dec while calculating the mean

library(dplyr)
library(tidyr)
dfone %>%
     group_by(subproduct)  %>%
     mutate(actuals = replace(actuals, actuals == 0, 
         mean(actuals[actuals != 0  & 
    between(date, as.Date("2019-01-01"), as.Date("2020-12-31"))], 
         na.rm = TRUE)))

Upvotes: 1

Related Questions