Haroon Lone
Haroon Lone

Reputation: 2949

Split time series data hourly in R

I have time-series data sampled at 10 minutes rate. I want to split it hour-wise, but to my surprise split.xts is not producing intended results. Steps used are:

library(xts)
set.seed(123)
Sys.setenv(TZ="Asia/Kolkata")
timeind <- seq(as.POSIXct("2017-01-20 00:00:00 IST"),
               as.POSIXct("2017-01-20 23:59:59 IST"),by="10 min") #for indexing
df <- xts(runif(length(timeind),30,50),timeind) #xts data frame 
split(df,"hours",k=1)

OUTPUT IS:

[[1]]
                        [,1]
2017-01-20 00:00:00 31.24343
2017-01-20 00:10:00 32.57921
2017-01-20 00:20:00 40.17684

[[2]]
                        [,1]
2017-01-20 00:30:00 41.89185
2017-01-20 00:40:00 30.93997
2017-01-20 00:50:00 31.76651
2017-01-20 01:00:00 49.07364
2017-01-20 01:10:00 34.79113
2017-01-20 01:20:00 48.13881

Expected output is:

[[1]]
                        [,1]
2017-01-20 00:00:00 31.24343
2017-01-20 00:10:00 32.57921
2017-01-20 00:20:00 40.17684
2017-01-20 00:30:00 41.89185
2017-01-20 00:40:00 30.93997
2017-01-20 00:50:00 31.76651

[[2]]
2017-01-20 01:00:00 49.07364
2017-01-20 01:10:00 34.79113
2017-01-20 01:20:00 48.13881
...

Why split.xts is not working properly?

Upvotes: 2

Views: 722

Answers (1)

Joshua Ulrich
Joshua Ulrich

Reputation: 176728

It's a known bug. If the index timezone happens to be one that is not a round hour offset from UTC, endpoints does not work correctly (because its calculations are based on UTC).

For example, Asia/Kolkata is UTC+0530, so endpoints aligns on half-hours.

A possible work-around would be to add 30 minutes to the index before calling split, then subtracting 30 minutes from each element of the result. Though that might cause issues around daylight saving time, if the timezone observes one.

df_adjusted <- df
.index(df_adjusted) <- .index(df_adjusted) - 60 * 30
by_hour <- lapply(split(df_adjusted, "hours"),
           function(x) { .index(x) <- .index(x) + 60 * 30; x })

Upvotes: 2

Related Questions