ljh2001
ljh2001

Reputation: 463

How to integrate data from two datasets with different dates?

Imagine two datasets. Dataset df1 is the day of the year that a certain event occurred. Df1 has multiple other columns, but for the purpose of this question, only day matters. Df2 contains two columns, the day of the year a measurement was made, and the value of that measurement. Essentially, I want to create a new column ("AVG") in df1 that is the mean value of that day in df2 plus the value from the previous day. For example, the AVG for day 2 in df1 would be 12.5 ((10+15)/2).

Some example data below.

df1 <- structure(list(day = c(2, 5, 7)), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8), value = c(10, 
15, 8, 13, 7, 20, 25, 12)), class = "data.frame", row.names = c(NA, 
-8L))

Example final product below.

df3 <- structure(list(day = c(2, 5, 7), AVG = c(12.5, 10, 22.5)), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 2

Views: 291

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35584

You can use lag() from dplyr to get the previous value so that you can calculate the rolling means.

library(dplyr)

df2 %>%
  mutate(AVG = (value + lag(value)) / 2, .keep = "unused") %>% 
  right_join(df1, by = "day")

#   day  AVG
# 1   2 12.5
# 2   5 10.0
# 3   7 22.5

Or by zoo::rollmeanr():

df2 %>%
  mutate(AVG = zoo::rollmeanr(value, 2, fill = NA), .keep = "unused") %>% 
  right_join(df1, by = "day")

#   day  AVG
# 1   2 12.5
# 2   5 10.0
# 3   7 22.5

Upvotes: 1

Duck
Duck

Reputation: 39603

You can play with indexes around positions like this using base R. It is easy to compute the mean as you only want that measure for only two values. Finally, you can assign the result to your df1:

#Detect position
i1 <- which(df2$day %in% df1$day)
#Extract values
j1 <- df2$value[i1]
j2 <- df2$value[i1-1]
#Compute mean
j3 <- (j1+j2)/2
#Assign
df1$AVG <- j3

Output:

  day  AVG
1   2 12.5
2   5 10.0
3   7 22.5

Upvotes: 1

Related Questions