pete
pete

Reputation: 129

How to get the average temperature for N days prior to a specific date in R, accounting for differences in station data?

This question is part of a previous one that I had (How to get the average temperature for N days prior to a specific date in R?), however things have complicated a bit (at least for me).

I have two datasets: one dataset (A) has the temperatures for each day, and on the other dataset (B) I have the individuals id and the date of birth (dob). I need to get the average temperature for the last 3 days prior to the dob of each individual. For example: if individual 1 was born in 02/20/2021, I need the average temperature from 02/17/2021 to 02/19/2021. However, I also have data coming from different weather station id (in df A), that needs to correspond with the right farm (in df B). Is there a way I could do that in R, so my output would be ind | dob | avg_temp, accounting for the right weather station. Here is one example data (in my real case, my data has a very large number of days, individuals, farms and station):

temp <- c(26,27,28,30,32,27,28,29)
date <- as.Date(c('02-15-2021', '02-16-2021', '02-17-2021', '02-18-2021', '02-19-2021', '02-20-2021', '02-21-2021',
                  '02-22-2021'), "%m-%d-%Y")
station <- c('A1', 'A2', 'A1', 'A2', 'A2', 'A1', 'A2', 'A1')
A <- data.frame(date, temp, station)
id <- c(1,2,3,4,5,6,7,8,9,10)
dob <- as.Date(c('02-18-2021', '02-17-2021', '02-20-2021', '02-23-2021', '02-25-2021', '02-23-2021', '02-17-2021',
                 '02-25-2021', '02-25-2021', '02-23-2021'), "%m-%d-%Y")
farm <- c('A1', 'A2', 'A1', 'A1', 'A1', 'A2', 'A2', 'A1', 'A1', 'A2')
B <- data.frame(id, dob, farm) 

Upvotes: 2

Views: 106

Answers (1)

TrainingPizza
TrainingPizza

Reputation: 1150

Thanks for the clarification. This extension of the function includes the location to calculate the average temperature. This assumes farm and station have corresponding values despite having different column names, as in your example. I've used mapply here to make sure the elements line up (i.e. x[1] goes with location[1], x[2] with location[2] and so on). That way if a particular date of birth only happens in one location it won't be calculated for all locations. I believe you could use lapply as well.

library(lubridate)

myfunc <- function(x, location){
  three_days <- as.Date(x - ddays(3))
  
  A <- A[A$date < x & A$date >= three_days & A$station %in% location, ]
  avg_temp <- mean(A$temp)
  dat <- data.frame(dob = x, avg_temp = avg_temp, farm = location)
  return(dat)
}

dobs <- unique(B[,c("dob", "farm")])

avg_temps <- mapply(myfunc, x = dobs$dob, location = dobs$farm, SIMPLIFY = FALSE)
avg_temps <- do.call(rbind, avg_temps)

B <- merge(B, avg_temps, by = c("dob", "farm"))

B
#>           dob farm id avg_temp
#> 1  2021-02-17   A2  2       27
#> 2  2021-02-17   A2  7       27
#> 3  2021-02-18   A1  1       27
#> 4  2021-02-20   A1  3       28
#> 5  2021-02-23   A1  4       28
#> 6  2021-02-23   A2  6       28
#> 7  2021-02-23   A2 10       28
#> 8  2021-02-25   A1  5       29
#> 9  2021-02-25   A1  8       29
#> 10 2021-02-25   A1  9       29

Created on 2022-02-07 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions