Reputation: 131
My ultimate goal is to use R to model hourly temperatures from daily temperature maxima and minima, spanning the years 1986 to 2017. I've successfully written code for a single date's data, but am having trouble applying this code across many dates.
I obtained daily temperature data from the National Resource Conservation Service (NRCS) for my focal site here: https://wcc.sc.egov.usda.gov/nwcc/site?sitenum=526
Following a model published here:
Reicosky, D.S., Winkelman, L.J., Baker, J.M., Baker, D.G. 1989. Accuracy of hourly air temperatures calculated from daily minima and maxima. Agricultural and Forest Meteorology. 46:193-209
I wrote the following code, which works great for modelling a single day's hourly temperature data:
#create df for SINGLE DATE.
#The actual data frame that I wish to model temperatures from will be exactly like this
#but with 11,689 rows.
d8a <- data.frame(
Day.of.Year = 213,
Date = as.Date("01-Aug-2011",format = "%d-%b-%Y"),
SunRise_decimal = 4.9,
Air.Temperature.Minimum..degC. = 8.0,
Air.Temperature.Maximum..degC. = 22.1
)
#create matrix to serve as repository for modeled hourly temp data
OneDay <- data.frame(OneDay <- matrix(0, ncol = 0, nrow = 24))
hour <- OneDay$hour <- c(0:23)
rise <- OneDay$sunrise <- d8a$SunRise_decimal
tmax <- OneDay$tmax <- d8a$Air.Temperature.Maximum..degC.
tmin <- OneDay$tmin <- d8a$Air.Temperature.Minimum..degC.
tavg <- OneDay$tavg <- (OneDay$tmax + OneDay$tmin) / 2
peakhour <- OneDay$peakhour <- 14
amp <- OneDay$amp <- (OneDay$tmax - OneDay$tmin)/2
#Now for the actual modelling:
OneDay$tmod <- ifelse(hour < rise, tavg + amp * cos(pi * (hour + 10) / (10 + rise)),
ifelse(hour > peakhour, tavg + amp * cos(pi * (hour - peakhour) / (10 + rise)),
ifelse(hour >= rise, tavg - amp * cos(pi * (hour - rise) / (peakhour - rise)),
99999)))
plot(tmod ~ hour, data = OneDay, pch = 19, cex = 1.5, ylim = c(8,23),
main = "01 August 2011", las = 1, ylab = "Temp (C)", xlab = "Hour of Day")
lines(tmod ~ hour, data = OneDay)
Finally, my question:
How can I iterate this code (or a more efficient version of this code) over every date in a data frame comprised of many dates?
I realize the final data set will be huge. ((31 years * 365 days per year * 24 hours per day) = 280,320 rows)
Upvotes: 1
Views: 314
Reputation: 789
Seems like data.table
could make this easy!
First, enclose your modeling logic in a function:
ModelHourly <- function(hour, rise, tmax, tmin) {
peakhour <- 14
tavg <- (tmax + tmin) / 2
amp <- (tmax - tmin) / 2
tmod <- ifelse(hour < rise, tavg + amp * cos(pi * (hour + 10) / (10 + rise)),
ifelse(hour > peakhour, tavg + amp * cos(pi * (hour - peakhour) / (10 + rise)),
ifelse(hour >= rise, tavg - amp * cos(pi * (hour - rise) / (peakhour - rise)),
99999)))
return(tmod)
}
Now set up an example dataset which is two days.
d8a <- data.frame(
Day.of.Year = 213,
Date = as.Date("01-Aug-2011",format = "%d-%b-%Y"),
SunRise_decimal = 4.9,
Air.Temperature.Minimum..degC. = 8.0,
Air.Temperature.Maximum..degC. = 22.1
)
d9a <-
data.frame(
Day.of.Year = 214,
Date = as.Date("02-Aug-2011",format = "%d-%b-%Y"),
SunRise_decimal = 5.0,
Air.Temperature.Minimum..degC. = 7.0,
Air.Temperature.Maximum..degC. = 25.1
)
dat <- rbind(d8a, d9a)
Turn it into a data.table
library('data.table')
dat <- as.data.table(dat)
Now we need to replicate each row 24 times and fill it with 0:23
. This seemed the easiest way to do that conceptually, but there are probably slicker approaches:
hourly <- dat[, .(hour=0:23), .(Date)]
dat <- merge(hourly, dat, by='Date')
If you're unfamiliar with data.table, what I've done is create a new table (hourly
) which has a column called "hour" that is 0:23, and I do this by each Date
. Then we merge it back to the original data table on the Date
column.
Now it's simply a matter of calling your function!
dat[, modeled := ModelHourly(hour, SunRise_decimal, Air.Temperature.Maximum..degC., Air.Temperature.Minimum..degC.)]
If you plot(dat$modeled)
you'll see two sine curves
Upvotes: 1
Reputation: 254
A very simple approach would be a for loop, I guess you could also do something with apply, but I guess a loop will be sufficient here especially since its just 11000 calculations (...).
Lets assume that your data is saved in the dataframe d8a
OneDay<-list()
for(i in 1:nrow(d8a)){
OneDay[[i]] <- data.frame(OneDay[[i]] <- matrix(0, ncol = 8, nrow = 24))
hour <- OneDay[[i]][,1] <- c(0:23)
rise <- OneDay[[i]][,2] <- d8a$SunRise_decimal[i]
tmax <- OneDay[[i]][,3] <- d8a$Air.Temperature.Maximum..degC.[i]
tmin <- OneDay[[i]][,4] <- d8a$Air.Temperature.Minimum..degC.[i]
tavg <- OneDay[[i]][,5] <- (OneDay[[i]][,3] + OneDay[[i]][,4]) / 2
peakhour <- OneDay[[i]][,6] <- 14
amp <- OneDay[[i]][,7] <- (OneDay[[i]][,3] - OneDay[[i]][,4])/2
#Now for the actual modelling:
OneDay[[i]][,8] <- ifelse(hour < rise, tavg + amp * cos(pi * (hour + 10) / (10 + rise)),
ifelse(hour > peakhour, tavg + amp * cos(pi * (hour - peakhour) / (10 + rise)),
ifelse(hour >= rise, tavg - amp * cos(pi * (hour - rise) / (peakhour - rise)),
99999)))
}
This will probably make you understand the code better since it's essentially your code with a wrapped loop. Every day will now be saved in a seperate list, you can later on combine them or just leave it as it is.
Upvotes: 1