pete
pete

Reputation: 129

How to get the average temperature for N days prior to a specific date in R?

I have two datasets: one dataset (A) has the temperatures for each day, and on the other dataset (B) I have the individuals id and the date of birth (dob). I need to get the average temperature for the last 3 days prior to the dob of each individual. For example: if individual 1 was born in 02/20/2021, I need the average temperature from 02/17/2021 to 02/19/2021. Is there a way I could do that in R so my output would be ind | dob | avg_temp. Here is one example data (in my real case, my data has a very large number of days and individuals):

> temp <- c(26,27,28,30,32,27,28,29)
> date <- as.Date(c('02-15-2021', '02-16-2021', '02-17-2021', '02-18-2021', '02-19-2021', '02-20-2021', '02-21-2021',
+ '02-22-2021'), "%m-%d-%Y")
> A <- data.frame(date, temp)
> id <- c(1,2,3,4,5,6,7,8,9,10)
> dob <- as.Date(c('02-18-2021', '02-17-2021', '02-20-2021', '02-23-2021', '02-25-2021', '02-23-2021', '02-17-2021',
+                  '02-25-2021', '02-25-2021', '02-23-2021'), "%m-%d-%Y")
> B <- data.frame(id, dob) 

In case the date does not have full 3 days, it would do the average with the number of days avaialble (2 or 1), and if no day is available it would return 0 as the average.

Does anyone could help me do this in R? As I mentioned above, my dateset is quite large, with ~37,000 ids, and temperatures ranging from 2007 to 2021.

Thank you in advance.

Upvotes: 1

Views: 525

Answers (1)

TrainingPizza
TrainingPizza

Reputation: 1150

Here's one way. Rather than duplicate the calculation, we’ll take a vector of the date of birth first and then merge them back in because several people have the same date of birth. The function itself is pretty straightforward. Take the three days prior to the date of birth from A, calculate the average, then return a data.frame that will make it easy for us to merge the results into B.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
myfunc <- function(x){
  three_days <- as.Date(x - ddays(3))
  
  A <- A[A$date < x & A$date >= three_days , ]
  avg_temp <- mean(A$temp)
  dat <- data.frame(dob = x, avg_temp = avg_temp)
  return(dat)
}

dobs <- unique(B$dob)

avg_temps <- lapply(dobs, myfunc)
avg_temps <- do.call(rbind, avg_temps)

B <- merge(B, avg_temps, by = "dob")

B
#>           dob id avg_temp
#> 1  2021-02-17  2     26.5
#> 2  2021-02-17  7     26.5
#> 3  2021-02-18  1     27.0
#> 4  2021-02-20  3     30.0
#> 5  2021-02-23  4     28.0
#> 6  2021-02-23  6     28.0
#> 7  2021-02-23 10     28.0
#> 8  2021-02-25  5     29.0
#> 9  2021-02-25  8     29.0
#> 10 2021-02-25  9     29.0

Created on 2022-02-02 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions