Reputation: 101
I have a problem applying a function (min) to a specific repeating time-period. Basically my data looks like in that sample:
library(xts)
start <- as.POSIXct("2018-05-18 00:00")
tseq <- seq(from = start, length.out = 1440, by = "10 mins")
Measurings <- data.frame(
Time = tseq,
Temp = sample(10:37,1440, replace = TRUE, set.seed(seed = 10)))
)
Measurings_xts <- xts(Measurings[,-1], Measurings$Time)
with much appreciated help (here), I managed to find out that min
and max
functions (contrary to mean
, which works right away in period.apply
) must be defined by a helper function and can then be calculated for logical datetime arguments(hours, days, years...) by using this solution:
colMin <- function(x, na.rm = FALSE) {
apply(x, 2, min, na.rm = na.rm)
}
epHours <- endpoints(Measurings_xts, "hours")
Measurings_min <- period.apply(Measurings_xts, epHours, colMin)
For meteorological analyses I need to calculate further minima for a less intuitive timespan, crossing the calendar day, that I fail to define in code:
I need to output the minimum nighttime temperature from e.g. 2018-05-18 19:00
to 2018-05-19 7:00
in the morning for each night in my dataset.
I have tried to move the timespan by manipulating(moving) the time column up or down, to include the nighttime in one calendar day. Since this solution is error-prone and doesn´t work for my real data, where some observations are missing. How do I use the POSIXct datetime
and/or xts
functionalities to calculate minima in this case?
Upvotes: 1
Views: 119
Reputation: 6891
You could solve this by creating your own "end points" when you use period.apply
# Choose the appropriate time ranges
z <- Measurings_xts["T19:00/T07:00"]
# Creating your own "endpoints":
epNights <- which(diff.xts(index(z), units = "mins") > 10) - 1
Subtract one off each index because the jumps are recorded at the start of the next "night interval" in the output from which()
.
Then add the last data point in the data set to your end points vector, and you can then use this in period.apply
epNights <- c(epNights, nrow(z))
Measurings_min <- period.apply(z, epNights, colMin)
Measurings_min
# [,1]
# 2018-05-18 07:00:00 10
# 2018-05-19 07:00:00 10
# 2018-05-20 07:00:00 10
# 2018-05-21 07:00:00 10
# 2018-05-22 07:00:00 10
# 2018-05-23 07:00:00 10
# 2018-05-24 07:00:00 11
# 2018-05-25 07:00:00 10
# 2018-05-26 07:00:00 10
# 2018-05-27 07:00:00 10
# 2018-05-27 23:50:00 12
Upvotes: 1
Reputation: 2229
here is one approach that works by defining a new group for each night interval
# define the time interval, e.g. from 19:00 to 7:00
from <- 19
to <- 7
hours <- as.numeric(strftime(index(Measurings_xts), format="%H"))
y <- rle(as.numeric(findInterval(hours, c(to,from)) != 1))
y$values[c(TRUE, FALSE)] <- cumsum(y$values[c(TRUE, FALSE)])
grp <- inverse.rle(y)
# grp is a grouping variable that is 0 for everything outside the
# defined interval , 1 for the first night, 2 for the second...
s <- split(Measurings_xts, grp); s$`0` <- NULL
# min_value will contain the minimum value for each night interval
min_value <- sapply(s, min)
# to see the date interval for each value
start <- sapply(s, function(x) as.character(index(x)[1]))
end <- sapply(s, function(x) as.character(index(x)[length(x)]))
data.frame(start, end, min_value)
# start end min_value
#1 2018-05-18 2018-05-18 06:50:00 10
#2 2018-05-18 19:00:00 2018-05-19 06:50:00 10
#3 2018-05-19 19:00:00 2018-05-20 06:50:00 10
#4 2018-05-20 19:00:00 2018-05-21 06:50:00 10
#5 2018-05-21 19:00:00 2018-05-22 06:50:00 10
#6 2018-05-22 19:00:00 2018-05-23 06:50:00 10
#7 2018-05-23 19:00:00 2018-05-24 06:50:00 11
#8 2018-05-24 19:00:00 2018-05-25 06:50:00 10
#9 2018-05-25 19:00:00 2018-05-26 06:50:00 10
#10 2018-05-26 19:00:00 2018-05-27 06:50:00 10
#11 2018-05-27 19:00:00 2018-05-27 23:50:00 12
Upvotes: 1