Reputation: 53
Thanks in advance for your time.
I am generating a sequence of hourly times from one date to another date in R. These are the two dates:
first_date_year_start <- as.Date("1995-1-1")
date_end <- as.Date("2015-10-31")
Then I use two different methods to generate the sequence. The first one is converting the dates to numeric and using steps of 1/24 (1 hour):
julDays_1hstep_simulation_period <- seq(from = 1, to = 23/24 + as.numeric(date_end-first_date_year_start) + 1, by = 1/24 )
The length of this vector is 182616.
The second approach is to change the format of the dates to one with time, and then generate the sequence:
first_date_year_start_with_time <- strptime (paste0(as.character(first_date_year_start), " 00:00") ,format = "%Y-%m-%d %H:%M")
date_end_with_time <- strptime (paste0(as.character(date_end), " 23:00") ,format = "%Y-%m-%d %H:%M")
dates_with_times_simulation_period <- seq(from =first_date_year_start_with_time , to = date_end_with_time , by = "hour")
The length of this vector is 182615.
Why do the lengths of these vectors differ by one? It's like if there was an extra hour somewhere.
The weird thing is that if I choose an end date closer to the beginning date, such as:
date_end <- as.Date("2015-1-3")
then the two vectors have the same length (175392)
Does anyone know the reason for this weird behavior?
Thanks again!
Upvotes: 1
Views: 273
Reputation: 1349
Your first method assumes that it is always 24 hours in day, which is not always the case. For instance, in the United States, because of daylight savings time.
Let's try out your methods with two dates, one day before and day after the 2015 U.S. daylight savings time on March 8 (if your locale is set to U.S.).
start <- as.Date("1995-1-1")
end_bef <- as.Date("2015-3-7")
end_aft <- as.Date("2015-3-9")
The two methods:
# Assumes 24 hours each day
method_1 <- function(start, end) {
out <- seq(
from = 1,
to = 23/24 + as.numeric(end - start) + 1,
by = 1/24
)
length(out)
}
# Lets `seq()` date method worry about daylight savings time, etc,
# based on locale
method_2 <- function(start, end) {
start <- strptime(
paste0(as.character(start), " 00:00"),
format = "%Y-%m-%d %H:%M"
)
end <- strptime(
paste0(as.character(end), " 23:00"),
format = "%Y-%m-%d %H:%M"
)
length(seq(start, end, "hour"))
}
Lets try it out:
method_1(start, end_bef) == method_2(start, end_bef)
#> [1] TRUE
method_1(start, end_aft) == method_2(start, end_aft)
#> [1] FALSE
Edit
Your original second method was correct, in my first version I counted 25 hours in the last day. Corrected now.
Upvotes: 3