PSz
PSz

Reputation: 1

R convert date, time and time zone strings to POSIXct

Date parsing bug

I am having trouble with character to date-time conversions and would appreciate help understanding what is going wrong. To do this, I define a very simple data frame with two rows, which holds an ID, a time zone, a date, and a time for each row. I would like to add a column that contains a (say) POSIXct entry for the combined date-time including the correct time zone. (This is a synthetic example but I want to apply this to a much larger data set.)

First we try combining these features into a unified representation of the data, time and time zone using R’s base facilities.

d <- data.frame(id=c(111, 222), 
            tzz=c("Europe/Berlin", "US/Eastern"), 
            d=c("09-Sep-2017", "11-Sep-2017"), 
            t=c("23:42:13", "22:05:17"),
            stringsAsFactors = FALSE)

d$dt <- strptime(paste(d$d, d$t), tz=d$tzz, format="%d-%b-%Y %T")

Error in strptime(paste(d$d, d$t), tz = d$tzz, format = "%d-%b-%Y %T") : 
  invalid 'tz' value

That approach fails, though it’s not clear to my why. For example, I can do the non-vectorized version of this easily. Also, the time zones I am using seem to be part of the officially supported list.

d$tzz %in% OlsonNames()

[1] TRUE TRUE

dt1 <- strptime(paste(d$d[1], d$t[1]), tz=d$tzz[1], format="%d-%b-%Y %T")
print(dt1)
[1] "2017-09-09 23:42:13 CEST"

print(tz(dt1))
[1] "Europe/Berlin"

dt2 <- strptime(paste(d$d[2], d$t[2]), tz=d$tzz[2], format="%d-%b-%Y %T")

print(dt2)
[1] "2017-09-11 22:05:17 EDT"

print(tz(dt2))
[1] "US/Eastern"

Also, Thinking that perhaps my problem was in misunderstanding how to use strptime, I then tried a similar approach with lubridate:

library(lubridate)
d$dt <- dmy_hms(paste(d$d, d$t), tz=d$tzz)

Error in strptime(.enclose(x), .enclose(fmt), tz) : invalid 'tz' value

but got the same error. Again, a non-vector version works fine.

dt1l <- dmy_hms(paste(d$d[1], d$t[1]), tz=d$tzz[1])
print(dt1l)
[1] "2017-09-09 23:42:13 CEST"

print(tz(dt1l))
[1] "Europe/Berlin"

Trying mutate in tidyverse yields the same problem. (Incidentally, CEST is not among the OlsonNames set.)

Help for how to do this correctly, or at least an explanation of how this is going wrong, would be much appreciated.

Upvotes: 0

Views: 369

Answers (2)

Dirk is no longer here
Dirk is no longer here

Reputation: 368241

Similar to Gabor's but with data.table using the fact that the ids are unique:

R> dt <- data.table(d)
R> dt[ , ct := as.POSIXct(paste(d, t), "%d-%b-%Y %H:%M:%S", tz=tzz), by=id][]
    id           tzz           d        t                  ct
1: 111 Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2: 222    US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
R> 

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269644

Try computing it row by row like this:

library(dplyr)
d %>%
  rowwise() %>%
  mutate(ct = as.POSIXct(paste(d, t), format = "%d-%b-%Y %H:%M:%S", tz = tzz)) %>%
  ungroup

giving:

# A tibble: 2 x 5
     id tzz           d           t        ct                 
  <dbl> <chr>         <chr>       <chr>    <dttm>             
1  111. Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2  222. US/Eastern    11-Sep-2017 22:05:17 2017-09-11 22:05:17

Upvotes: 1

Related Questions