Reputation: 2773
I'd like to convert multiple time values with varying time zones, currently represented as milliseconds since 01-01-1970, to a POSIXct format.
I have the following dataset:
times <- c(1427450400291, 1428562800616, 1418651628795, 1418651938990, 1418652348281, 1418652450161)
tzones <- c("America/Los_Angeles", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Israel Standard Time")
The problem is that the as.POSIXct
method only accepts one tz value, and not a vector. Therefore, I can't call it directly. I tried using lapply and call it element by element, but it takes a long time (for longer vectors):
get.dates.with.timezones <- function(epoch.vec,tz.vec) {
res <- lapply(seq(epoch.vec),function(x){
as.POSIXct(epoch.vec[x]/1000,origin = "1970-01-01", tz = tz.vec[x])
})
return(do.call(c,res))
}
So for only 1200 values, it takes almost a second.
timesX200 <- rep(times,200)
tzonesX200 <- rep(tzones,200)
system.time( get.dates.with.timezones(timesX200,tzonesX200) )
user system elapsed
0.86800000000005184 0.01999999999999602 0.88899999999921420
I'm a newbie with R, so I wonder if there are ways to improve the performance for this task. is there a vectorized option for this problem? Additionally, it looks like the as.POXIXct()
method itself has some performance issues, as indicated here.
---------- EDIT --------
Apparentely it is impossible to hold a vector of POSIXct with varying time zones. From the POSIXct documentation:
Using c on "POSIXlt" objects converts them to the current time zone, and on "POSIXct" objects drops any "tzone" attributes (even if they are all marked with the same time zone). Source
That's too bad. I wonder if there are any alternatives for dealing with date + time + varying time zone. Would be happy to hear if there is.
Upvotes: 1
Views: 682
Reputation: 28441
I found this method to be much faster. It also outputs a list which preserves the time zones created:
f_time <- function(x,y) as.POSIXct(x/1000, origin="1970-01-01", tz=y)
s <- split(timesX200, tzonesX200)
result <- mapply(f_time, s, names(s))
Your output does not retain the time-zone assignments. Check your output:
get.dates.with.timezones(times, tzones)
[1] "2015-03-27 06:00:00 EDT" "2015-04-09 03:00:00 EDT"
[3] "2014-12-15 08:53:48 EST" "2014-12-15 08:58:58 EST"
[5] "2014-12-15 09:05:48 EST" "2014-12-15 09:07:30 EST"
They are all coerced to the local time-zone.
benchmark test
times <- c(1427450400291, 1428562800616, 1418651628795, 1418651938990, 1418652348281, 1418652450161)
tzones <- c("America/Los_Angeles", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Israel")
timesX200 <- rep(times,200)
tzonesX200 <- rep(tzones,200)
get.dates.with.timezones <- function(epoch.vec,tz.vec) {
res <- lapply(seq(epoch.vec),function(x){
as.POSIXct(epoch.vec[x]/1000,origin = "1970-01-01", tz = tz.vec[x])
})
return(do.call(c,res))
}
library(microbenchmark)
microbenchmark(
get = get.dates.with.timezones(timesX200, tzonesX200),
plafort = {s <- split(timesX200, tzonesX200);mapply(f_time, s, names(s))},
times=20L)
# Unit: microseconds
# expr min lq mean median uq
# get 342693.638 362465.069 378195.687 372553.491 389080.277
# plafort 997.138 1027.731 1110.846 1107.471 1149.314
# max neval cld
# 445539.744 20 b
# 1558.473 20 a
Upvotes: 1