C. Denney
C. Denney

Reputation: 627

Calculate the mean of values that fall between 2 dates

I have 2 dataframes. One is a list of occasional events. It has a date column and a column of values.

df1 = data.frame(date = c(as.Date('2020-01-01'), as.Date('2020-02-02'), as.Date('2020-03-01')),
                 value = c(1,5,9))

I have another data frame that is a daily record. It too has a date column and a column of values.

set.seed(1)
df2 = data.frame(date = seq.Date(from = as.Date('2020-01-01'), to = as.Date('2020-04-01'), by = 1),
                 value = rnorm(92))

I want to create a new column in df1 that is the mean of df2$value from the current row date to the subsequent date value (non inclusive of the second value, so in this example, the first new value would be the mean of values from df2 of row 1 through row 32, where row 33 is the row that matches df1$date[2]). The resultant data frame would look like the following:

        date value value_new
1 2020-01-01     1 0.1165512
2 2020-02-02     5 0.0974052
3 2020-03-01     9 0.1241778

But I have no idea how to specify that. Also I would prefer the last value to be the mean of whatever data is beyond the last value of df1$date, but I would also accept an NA.

Upvotes: 0

Views: 40

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

We can joion df2 with df1, fill the NA values with previous values and get mean of value_new column.

library(dplyr)

df2 %>% 
  rename(value_new = value) %>%
  left_join(df1, by = 'date') %>%
  tidyr::fill(value) %>%
  group_by(value) %>%
  summarise(date = first(date),
            value_new = mean(value_new))


# A tibble: 3 x 3
#   value date       value_new
#   <dbl> <date>       <dbl>
#1     1 2020-01-01    0.117 
#2     5 2020-02-02    0.0974
#3     9 2020-03-01    0.124 

Upvotes: 1

Related Questions