Reputation: 627
I have 2 dataframes. One is a list of occasional events. It has a date column and a column of values.
df1 = data.frame(date = c(as.Date('2020-01-01'), as.Date('2020-02-02'), as.Date('2020-03-01')),
value = c(1,5,9))
I have another data frame that is a daily record. It too has a date column and a column of values.
set.seed(1)
df2 = data.frame(date = seq.Date(from = as.Date('2020-01-01'), to = as.Date('2020-04-01'), by = 1),
value = rnorm(92))
I want to create a new column in df1 that is the mean of df2$value from the current row date to the subsequent date value (non inclusive of the second value, so in this example, the first new value would be the mean of values from df2 of row 1 through row 32, where row 33 is the row that matches df1$date[2]). The resultant data frame would look like the following:
date value value_new
1 2020-01-01 1 0.1165512
2 2020-02-02 5 0.0974052
3 2020-03-01 9 0.1241778
But I have no idea how to specify that. Also I would prefer the last value to be the mean of whatever data is beyond the last value of df1$date, but I would also accept an NA.
Upvotes: 0
Views: 40
Reputation: 389175
We can joion df2
with df1
, fill
the NA
values with previous values and get mean
of value_new
column.
library(dplyr)
df2 %>%
rename(value_new = value) %>%
left_join(df1, by = 'date') %>%
tidyr::fill(value) %>%
group_by(value) %>%
summarise(date = first(date),
value_new = mean(value_new))
# A tibble: 3 x 3
# value date value_new
# <dbl> <date> <dbl>
#1 1 2020-01-01 0.117
#2 5 2020-02-02 0.0974
#3 9 2020-03-01 0.124
Upvotes: 1