nrj
nrj

Reputation: 53

R - Sequence of dates has different length depending on format used

Thanks in advance for your time.

I am generating a sequence of hourly times from one date to another date in R. These are the two dates:

    first_date_year_start <- as.Date("1995-1-1")
    date_end <- as.Date("2015-10-31")

Then I use two different methods to generate the sequence. The first one is converting the dates to numeric and using steps of 1/24 (1 hour):

    julDays_1hstep_simulation_period <- seq(from = 1, to = 23/24 + as.numeric(date_end-first_date_year_start) + 1, by = 1/24 )

The length of this vector is 182616.

The second approach is to change the format of the dates to one with time, and then generate the sequence:

    first_date_year_start_with_time <- strptime (paste0(as.character(first_date_year_start), " 00:00") ,format = "%Y-%m-%d %H:%M") 
    date_end_with_time <- strptime (paste0(as.character(date_end), " 23:00") ,format = "%Y-%m-%d %H:%M") 

    dates_with_times_simulation_period <- seq(from =first_date_year_start_with_time , to = date_end_with_time , by = "hour")

The length of this vector is 182615.

Why do the lengths of these vectors differ by one? It's like if there was an extra hour somewhere.

The weird thing is that if I choose an end date closer to the beginning date, such as:

    date_end <- as.Date("2015-1-3")

then the two vectors have the same length (175392)

Does anyone know the reason for this weird behavior?

Thanks again!

Upvotes: 1

Views: 273

Answers (1)

Mikael Poul Johannesson
Mikael Poul Johannesson

Reputation: 1349

Your first method assumes that it is always 24 hours in day, which is not always the case. For instance, in the United States, because of daylight savings time.


Let's try out your methods with two dates, one day before and day after the 2015 U.S. daylight savings time on March 8 (if your locale is set to U.S.).

start <- as.Date("1995-1-1")
end_bef <- as.Date("2015-3-7")
end_aft <- as.Date("2015-3-9")

The two methods:

# Assumes 24 hours each day
method_1 <- function(start, end) {
  out <- seq(
    from = 1,
    to = 23/24 + as.numeric(end - start) + 1,
    by = 1/24
  )
  length(out)
}

# Lets `seq()` date method worry about daylight savings time, etc,
# based on locale
method_2 <- function(start, end) {

  start <- strptime(
    paste0(as.character(start), " 00:00"),
    format = "%Y-%m-%d %H:%M"
  )
  end <- strptime(
    paste0(as.character(end), " 23:00"),
    format = "%Y-%m-%d %H:%M"
  )

  length(seq(start, end, "hour"))
}

Lets try it out:

method_1(start, end_bef) == method_2(start, end_bef)
#> [1] TRUE

method_1(start, end_aft) == method_2(start, end_aft)
#> [1] FALSE

Edit

Your original second method was correct, in my first version I counted 25 hours in the last day. Corrected now.

Upvotes: 3

Related Questions