Reputation: 1
I am having trouble with character to date-time conversions and would appreciate help understanding what is going wrong. To do this, I define a very simple data frame with two rows, which holds an ID, a time zone, a date, and a time for each row. I would like to add a column that contains a (say) POSIXct entry for the combined date-time including the correct time zone. (This is a synthetic example but I want to apply this to a much larger data set.)
First we try combining these features into a unified representation of the data, time and time zone using R’s base facilities.
d <- data.frame(id=c(111, 222),
tzz=c("Europe/Berlin", "US/Eastern"),
d=c("09-Sep-2017", "11-Sep-2017"),
t=c("23:42:13", "22:05:17"),
stringsAsFactors = FALSE)
d$dt <- strptime(paste(d$d, d$t), tz=d$tzz, format="%d-%b-%Y %T")
Error in strptime(paste(d$d, d$t), tz = d$tzz, format = "%d-%b-%Y %T") :
invalid 'tz' value
That approach fails, though it’s not clear to my why. For example, I can do the non-vectorized version of this easily. Also, the time zones I am using seem to be part of the officially supported list.
d$tzz %in% OlsonNames()
[1] TRUE TRUE
dt1 <- strptime(paste(d$d[1], d$t[1]), tz=d$tzz[1], format="%d-%b-%Y %T")
print(dt1)
[1] "2017-09-09 23:42:13 CEST"
print(tz(dt1))
[1] "Europe/Berlin"
dt2 <- strptime(paste(d$d[2], d$t[2]), tz=d$tzz[2], format="%d-%b-%Y %T")
print(dt2)
[1] "2017-09-11 22:05:17 EDT"
print(tz(dt2))
[1] "US/Eastern"
Also, Thinking that perhaps my problem was in misunderstanding how to use strptime, I then tried a similar approach with lubridate:
library(lubridate)
d$dt <- dmy_hms(paste(d$d, d$t), tz=d$tzz)
Error in strptime(.enclose(x), .enclose(fmt), tz) : invalid 'tz' value
but got the same error. Again, a non-vector version works fine.
dt1l <- dmy_hms(paste(d$d[1], d$t[1]), tz=d$tzz[1])
print(dt1l)
[1] "2017-09-09 23:42:13 CEST"
print(tz(dt1l))
[1] "Europe/Berlin"
Trying mutate in tidyverse yields the same problem. (Incidentally, CEST is not among the OlsonNames set.)
Help for how to do this correctly, or at least an explanation of how this is going wrong, would be much appreciated.
Upvotes: 0
Views: 369
Reputation: 368241
Similar to Gabor's but with data.table
using the fact that the ids are unique:
R> dt <- data.table(d)
R> dt[ , ct := as.POSIXct(paste(d, t), "%d-%b-%Y %H:%M:%S", tz=tzz), by=id][]
id tzz d t ct
1: 111 Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2: 222 US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
R>
Upvotes: 0
Reputation: 269644
Try computing it row by row like this:
library(dplyr)
d %>%
rowwise() %>%
mutate(ct = as.POSIXct(paste(d, t), format = "%d-%b-%Y %H:%M:%S", tz = tzz)) %>%
ungroup
giving:
# A tibble: 2 x 5
id tzz d t ct
<dbl> <chr> <chr> <chr> <dttm>
1 111. Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2 222. US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
Upvotes: 1