R average from specific rows

Question

I have an exemplary data.frame with two columns, 'month' and 'value'.

I would like to calculate the mean value for each row using the data of the current and previous month without the respective observation itself. This means the mean value of n-1 rows, where n is the number of rows of the current and previous month.

I know how to calculate a n-1 average over all rows (see column 'mean_k_1').

What is still missing is the specification of n, which means that instead of all rows, only the rows of the current and previous month should be considered.

The expected result is in column 'mean_lag'.

Here is an MWE:

month <- as.Date(c("2020-01-01", "2020-01-01", "2020-02-01", "2020-02-01", "2020-02-01", "2020-03-01",
                   "2020-04-01", "2020-04-01", "2020-04-01", "2020-05-01", "2020-05-01"))
value <- c(102, 110, 98, 76, 320, 419, 215, 50, 99, 46, 129)
mean_lag <- c(110, 102, 152, 157.5, 96.5, 164.7, 189.3, 244.3, 228.0, 123.3, 102.5)
df <- cbind.data.frame(month, value, mean_lag)
n <- nrow(df)

df$mean_k_1 <- (mean(df$value)*n - df$value)/(n-1)

Ronak Shah · Accepted Answer

For each value of month we take mean of all values for current and previous month excluding current observation.

library(dplyr)
library(lubridate)

df %>%
  mutate(mean_lag_res = purrr::map_dbl(row_number(), ~{
    val <- month[.x]
    mean(value[-.x][between(month[-.x], val %m-% months(1), val)])
}))

#        month value mean_lag mean_lag_res
#1  2020-01-01   102    110.0        110.0
#2  2020-01-01   110    102.0        102.0
#3  2020-02-01    98    152.0        152.0
#4  2020-02-01    76    157.5        157.5
#5  2020-02-01   320     96.5         96.5
#6  2020-03-01   419    164.7        164.7
#7  2020-04-01   215    189.3        189.3
#8  2020-04-01    50    244.3        244.3
#9  2020-04-01    99    228.0        228.0
#10 2020-05-01    46    123.3        123.2
#11 2020-05-01   129    102.5        102.5

R average from specific rows

Answers (1)

Related Questions