Reputation: 2949
I have time-series data sampled at 10 minutes rate. I want to split it hour-wise, but to my surprise split.xts
is not producing intended results. Steps used are:
library(xts)
set.seed(123)
Sys.setenv(TZ="Asia/Kolkata")
timeind <- seq(as.POSIXct("2017-01-20 00:00:00 IST"),
as.POSIXct("2017-01-20 23:59:59 IST"),by="10 min") #for indexing
df <- xts(runif(length(timeind),30,50),timeind) #xts data frame
split(df,"hours",k=1)
OUTPUT IS:
[[1]]
[,1]
2017-01-20 00:00:00 31.24343
2017-01-20 00:10:00 32.57921
2017-01-20 00:20:00 40.17684
[[2]]
[,1]
2017-01-20 00:30:00 41.89185
2017-01-20 00:40:00 30.93997
2017-01-20 00:50:00 31.76651
2017-01-20 01:00:00 49.07364
2017-01-20 01:10:00 34.79113
2017-01-20 01:20:00 48.13881
Expected output is:
[[1]]
[,1]
2017-01-20 00:00:00 31.24343
2017-01-20 00:10:00 32.57921
2017-01-20 00:20:00 40.17684
2017-01-20 00:30:00 41.89185
2017-01-20 00:40:00 30.93997
2017-01-20 00:50:00 31.76651
[[2]]
2017-01-20 01:00:00 49.07364
2017-01-20 01:10:00 34.79113
2017-01-20 01:20:00 48.13881
...
Why split.xts
is not working properly?
Upvotes: 2
Views: 722
Reputation: 176728
It's a known bug. If the index timezone happens to be one that is not a round hour offset from UTC, endpoints
does not work correctly (because its calculations are based on UTC).
For example, Asia/Kolkata is UTC+0530, so endpoints
aligns on half-hours.
A possible work-around would be to add 30 minutes to the index before calling split
, then subtracting 30 minutes from each element of the result. Though that might cause issues around daylight saving time, if the timezone observes one.
df_adjusted <- df
.index(df_adjusted) <- .index(df_adjusted) - 60 * 30
by_hour <- lapply(split(df_adjusted, "hours"),
function(x) { .index(x) <- .index(x) + 60 * 30; x })
Upvotes: 2