Reputation: 283
I have an exemplary data.frame with two columns, 'month' and 'value'.
I would like to calculate the mean value for each row using the data of the current and previous month without the respective observation itself. This means the mean value of n-1 rows, where n is the number of rows of the current and previous month.
I know how to calculate a n-1 average over all rows (see column 'mean_k_1').
What is still missing is the specification of n, which means that instead of all rows, only the rows of the current and previous month should be considered.
The expected result is in column 'mean_lag'.
Here is an MWE:
month <- as.Date(c("2020-01-01", "2020-01-01", "2020-02-01", "2020-02-01", "2020-02-01", "2020-03-01",
"2020-04-01", "2020-04-01", "2020-04-01", "2020-05-01", "2020-05-01"))
value <- c(102, 110, 98, 76, 320, 419, 215, 50, 99, 46, 129)
mean_lag <- c(110, 102, 152, 157.5, 96.5, 164.7, 189.3, 244.3, 228.0, 123.3, 102.5)
df <- cbind.data.frame(month, value, mean_lag)
n <- nrow(df)
df$mean_k_1 <- (mean(df$value)*n - df$value)/(n-1)
Upvotes: 1
Views: 378
Reputation: 388807
For each value of month
we take mean
of all values for current and previous month excluding current observation.
library(dplyr)
library(lubridate)
df %>%
mutate(mean_lag_res = purrr::map_dbl(row_number(), ~{
val <- month[.x]
mean(value[-.x][between(month[-.x], val %m-% months(1), val)])
}))
# month value mean_lag mean_lag_res
#1 2020-01-01 102 110.0 110.0
#2 2020-01-01 110 102.0 102.0
#3 2020-02-01 98 152.0 152.0
#4 2020-02-01 76 157.5 157.5
#5 2020-02-01 320 96.5 96.5
#6 2020-03-01 419 164.7 164.7
#7 2020-04-01 215 189.3 189.3
#8 2020-04-01 50 244.3 244.3
#9 2020-04-01 99 228.0 228.0
#10 2020-05-01 46 123.3 123.2
#11 2020-05-01 129 102.5 102.5
Upvotes: 1