geodex
geodex

Reputation: 1249

R convert hourly to daily data up to 0:00 instead of 23:00

How do you set 0:00 as end of day instead of 23:00 in an hourly data? I have this struggle while using period.apply or to.period as both return days ending at 23:00. Here is an example :

x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:00:00"), by="hour"), x = rnorm(120))

The following functions show periods ends at 23:00

to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "days")
x1[endpoints(x1, 'days')]

So when I am aggregating the hourly data to daily, does someone have an idea how to set the end of day at 0:00?

Upvotes: 3

Views: 460

Answers (2)

FXQuantTrader
FXQuantTrader

Reputation: 6891

As already pointed out by another answer here, to.period on days computes on the data with timestamps between 00:00:00 and 23:59:59.9999999 on the day in question. so 23:00:00 is seen as the last timestamp in your data, and 00:00:00 corresponds to a value in the next day "bin".

What you can do is shift all the timestamps back 1 hour, use to.period get the daily data points from the hour points, and then using align.time to get the timestamps aligned correctly.

(More generally, to.period is useful for generating OHLCV type data, and so if you're say generating say hourly bars from ticks, it makes sense to look at all the ticks between 23:00:00 and 23:59:59.99999 in the bar creation. then 00:00:00 to 00:59:59.9999.... would form the next hourly bar and so on.)

Here is an example:

> tail(x1["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -1.2760349
# 2018-02-01 19:00:00 -0.1496041
# 2018-02-01 20:00:00 -0.5989614
# 2018-02-01 21:00:00 -0.9691905
# 2018-02-01 22:00:00 -0.2519618
# 2018-02-01 23:00:00 -1.6081656

> head(x1["2018-02-02"])
# [,1]
# 2018-02-02 00:00:00 -0.3373271
# 2018-02-02 01:00:00  0.8312698
# 2018-02-02 02:00:00  0.9321747
# 2018-02-02 03:00:00  0.6719425
# 2018-02-02 04:00:00 -0.5597391
# 2018-02-02 05:00:00 -0.9810128

> head(x1["2018-02-03"])
# [,1]
# 2018-02-03 00:00:00  2.3746424
# 2018-02-03 01:00:00  0.8536594
# 2018-02-03 02:00:00 -0.2467268
# 2018-02-03 03:00:00 -0.1316978
# 2018-02-03 04:00:00  0.3079848
# 2018-02-03 05:00:00  0.2445634

x2 <- x1
.index(x2) <- .index(x1) - 3600

> tail(x2["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -0.1496041
# 2018-02-01 19:00:00 -0.5989614
# 2018-02-01 20:00:00 -0.9691905
# 2018-02-01 21:00:00 -0.2519618
# 2018-02-01 22:00:00 -1.6081656
# 2018-02-01 23:00:00 -0.3373271

x.d2 <- to.period(x2, OHLC = FALSE, drop.date = FALSE, period = "days")

> x.d2
# [,1]
# 2018-01-31 23:00:00  0.12516594
# 2018-02-01 23:00:00 -0.33732710
# 2018-02-02 23:00:00  2.37464235
# 2018-02-03 23:00:00  0.51797747
# 2018-02-04 23:00:00  0.08955208
# 2018-02-05 22:00:00  0.33067734

x.d2 <- align.time(x.d2, n = 86400)

> x.d2
# [,1]
# 2018-02-01  0.12516594
# 2018-02-02 -0.33732710
# 2018-02-03  2.37464235
# 2018-02-04  0.51797747
# 2018-02-05  0.08955208
# 2018-02-06  0.33067734

Want to convince yourself? Try something like this:

x3 <- rbind(x1, xts(x = matrix(c(1,2), nrow = 2), order.by = as.POSIXct(c("2018-02-01 23:59:59.999", "2018-02-02 00:00:00"))))

x3["2018-02-01 23/2018-02-02 01"]
# [,1]
# 2018-02-01 23:00:00.000 -1.6081656
# 2018-02-01 23:59:59.999  1.0000000
# 2018-02-02 00:00:00.000 -0.3373271
# 2018-02-02 00:00:00.000  2.0000000
# 2018-02-02 01:00:00.000  0.8312698
x3.d <- to.period(x3, OHLC = FALSE, drop.date = FALSE, period = "days")


> x3.d <- align.time(x3.d, 86400)
> x3.d
[,1]
2018-02-02  1.00000000
2018-02-03 -0.09832625
2018-02-04 -0.65075506
2018-02-05 -0.09423664
2018-02-06  0.33067734

See that the value of 2 on 00:00:00 did not form the last observation in the day for 2018-02-02 (00:00:00), which went from 2018-02-01 00:00:00 to 2018-02-01 23:59:59.9999.

Of course, if you want the daily timestamp to be the start of the day, not the end of the day, which would be 2018-02-01 as start of bar for the first row, in x3.d above, you could shift back the day by one. You could do this relatively safely for most timezones, when your data doesn't involve weekend dates:

index(x3.d) = index(x3.d) - 86400

I say relatively safetly, because there are corner cases when there are time shifts in a time zone. e.g. Be careful with day light savings. Simply subtracting -86400 can be a problem when going from Sunday to Saturday in time zones where day light saving occurs:

  #e.g. bad:  day light savings occurs on this weekend for US EST
  z <- xts(x = 9, order.by = as.POSIXct("2018-03-12", tz = "America/New_York"))
  > index(z) - 86400
  [1] "2018-03-10 23:00:00 EST"

i.e. the timestamp is off by one hour, when you really want the midnight timestamp (00:00:00).

You could get around this problem using something much safer like this:

  library(lubridate)

  # right
  > index(z) - days(1)
  [1] "2018-03-11 EST"

Upvotes: 2

Amar
Amar

Reputation: 1373

I don't think this is possible because 00:00 is the start of the day. From the manual:

These endpoints are aligned in POSIXct time to the zero second of the day at the beginning, and the 59.9999th second of the 59th minute of the 23rd hour of the final day

I think the solution here is to use minutes instead of hours. Using your example:

x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:59:99"), by="min"), x = rnorm(7200))
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "day")
x1[endpoints(x1, 'day')]

Upvotes: 1

Related Questions