Reputation: 129
This question is part of a previous one that I had (How to get the average temperature for N days prior to a specific date in R?), however things have complicated a bit (at least for me).
I have two datasets: one dataset (A) has the temperatures for each day, and on the other dataset (B) I have the individuals id and the date of birth (dob). I need to get the average temperature for the last 3 days prior to the dob of each individual. For example: if individual 1 was born in 02/20/2021, I need the average temperature from 02/17/2021 to 02/19/2021. However, I also have data coming from different weather station id (in df A), that needs to correspond with the right farm (in df B). Is there a way I could do that in R, so my output would be ind | dob | avg_temp, accounting for the right weather station. Here is one example data (in my real case, my data has a very large number of days, individuals, farms and station):
temp <- c(26,27,28,30,32,27,28,29)
date <- as.Date(c('02-15-2021', '02-16-2021', '02-17-2021', '02-18-2021', '02-19-2021', '02-20-2021', '02-21-2021',
'02-22-2021'), "%m-%d-%Y")
station <- c('A1', 'A2', 'A1', 'A2', 'A2', 'A1', 'A2', 'A1')
A <- data.frame(date, temp, station)
id <- c(1,2,3,4,5,6,7,8,9,10)
dob <- as.Date(c('02-18-2021', '02-17-2021', '02-20-2021', '02-23-2021', '02-25-2021', '02-23-2021', '02-17-2021',
'02-25-2021', '02-25-2021', '02-23-2021'), "%m-%d-%Y")
farm <- c('A1', 'A2', 'A1', 'A1', 'A1', 'A2', 'A2', 'A1', 'A1', 'A2')
B <- data.frame(id, dob, farm)
Upvotes: 2
Views: 106
Reputation: 1150
Thanks for the clarification. This extension of the function includes the location to calculate the average temperature. This assumes farm
and station
have corresponding values despite having different column names, as in your example. I've used mapply
here to make sure the elements line up (i.e. x[1] goes with location[1], x[2] with location[2] and so on). That way if a particular date of birth only happens in one location it won't be calculated for all locations. I believe you could use lapply
as well.
library(lubridate)
myfunc <- function(x, location){
three_days <- as.Date(x - ddays(3))
A <- A[A$date < x & A$date >= three_days & A$station %in% location, ]
avg_temp <- mean(A$temp)
dat <- data.frame(dob = x, avg_temp = avg_temp, farm = location)
return(dat)
}
dobs <- unique(B[,c("dob", "farm")])
avg_temps <- mapply(myfunc, x = dobs$dob, location = dobs$farm, SIMPLIFY = FALSE)
avg_temps <- do.call(rbind, avg_temps)
B <- merge(B, avg_temps, by = c("dob", "farm"))
B
#> dob farm id avg_temp
#> 1 2021-02-17 A2 2 27
#> 2 2021-02-17 A2 7 27
#> 3 2021-02-18 A1 1 27
#> 4 2021-02-20 A1 3 28
#> 5 2021-02-23 A1 4 28
#> 6 2021-02-23 A2 6 28
#> 7 2021-02-23 A2 10 28
#> 8 2021-02-25 A1 5 29
#> 9 2021-02-25 A1 8 29
#> 10 2021-02-25 A1 9 29
Created on 2022-02-07 by the reprex package (v2.0.1)
Upvotes: 1