Omri374
Omri374

Reputation: 2773

R- efficiently convert time in milliseconds to as.POSIXct with varying time zones

I'd like to convert multiple time values with varying time zones, currently represented as milliseconds since 01-01-1970, to a POSIXct format.

I have the following dataset:

times <- c(1427450400291, 1428562800616, 1418651628795, 1418651938990, 1418652348281, 1418652450161)
tzones <- c("America/Los_Angeles", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Israel Standard Time")

The problem is that the as.POSIXct method only accepts one tz value, and not a vector. Therefore, I can't call it directly. I tried using lapply and call it element by element, but it takes a long time (for longer vectors):

get.dates.with.timezones <- function(epoch.vec,tz.vec) {  
    res <- lapply(seq(epoch.vec),function(x){
           as.POSIXct(epoch.vec[x]/1000,origin = "1970-01-01", tz = tz.vec[x])
        })
        return(do.call(c,res))
}

So for only 1200 values, it takes almost a second.

timesX200 <- rep(times,200)
tzonesX200 <- rep(tzones,200)
system.time( get.dates.with.timezones(timesX200,tzonesX200) )
           user              system             elapsed 
0.86800000000005184 0.01999999999999602 0.88899999999921420 

I'm a newbie with R, so I wonder if there are ways to improve the performance for this task. is there a vectorized option for this problem? Additionally, it looks like the as.POXIXct() method itself has some performance issues, as indicated here.

---------- EDIT --------

Apparentely it is impossible to hold a vector of POSIXct with varying time zones. From the POSIXct documentation:

Using c on "POSIXlt" objects converts them to the current time zone, and on "POSIXct" objects drops any "tzone" attributes (even if they are all marked with the same time zone). Source

That's too bad. I wonder if there are any alternatives for dealing with date + time + varying time zone. Would be happy to hear if there is.

Upvotes: 1

Views: 682

Answers (1)

Pierre L
Pierre L

Reputation: 28441

I found this method to be much faster. It also outputs a list which preserves the time zones created:

f_time <- function(x,y) as.POSIXct(x/1000, origin="1970-01-01", tz=y)
s <- split(timesX200, tzonesX200)
result <- mapply(f_time, s, names(s))

Your output does not retain the time-zone assignments. Check your output:

get.dates.with.timezones(times, tzones)
[1] "2015-03-27 06:00:00 EDT" "2015-04-09 03:00:00 EDT"
[3] "2014-12-15 08:53:48 EST" "2014-12-15 08:58:58 EST"
[5] "2014-12-15 09:05:48 EST" "2014-12-15 09:07:30 EST"

They are all coerced to the local time-zone.

benchmark test

times <- c(1427450400291, 1428562800616, 1418651628795, 1418651938990, 1418652348281, 1418652450161)
tzones <- c("America/Los_Angeles", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Africa/Casablanca", "Israel")

timesX200 <- rep(times,200)
tzonesX200 <- rep(tzones,200)


get.dates.with.timezones <- function(epoch.vec,tz.vec) {  
    res <- lapply(seq(epoch.vec),function(x){
           as.POSIXct(epoch.vec[x]/1000,origin = "1970-01-01", tz = tz.vec[x])
        })
        return(do.call(c,res))
}

library(microbenchmark)
microbenchmark(
  get = get.dates.with.timezones(timesX200, tzonesX200),
  plafort = {s <- split(timesX200, tzonesX200);mapply(f_time, s, names(s))},
  times=20L)
# Unit: microseconds
#     expr        min         lq       mean     median         uq
#      get 342693.638 362465.069 378195.687 372553.491 389080.277
#  plafort    997.138   1027.731   1110.846   1107.471   1149.314
#         max neval cld
#  445539.744    20   b
#    1558.473    20  a 

Upvotes: 1

Related Questions