Sahand
Sahand

Reputation: 87

using apply function with data.table, why this is so slow?

I am using data.table package and I used this:

dt$date<- as.POSIXct(dt$date, tz="GMT")      (I know I can use fastPOSIXct)
2009-08-07 06:00:14
2009-08-07 06:00:15
2009-08-07 06:00:16 
2009-08-07 06:00:24

I want to change the time zone (there are many of them) and extract the hour. Suppose that I want to use apply function:

f <- function(x) {
  SydneyTime<-format(x["date"], format = "%Y-%m-%d %H:%M:%OS", tz = "Australia/Sydney")
  Sy<-hour(SydneyTime)
  return(Sy)
}

mydata$SyHour <- apply(dt, 1, f)

This is too slow, am I missing something? I don't want to keep a copy of SydneyTime.

Thanks.

Upvotes: 3

Views: 679

Answers (1)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59970

You don't need to copy anything. format.Date is vectorised so you could use := to make a new column in your data.table using the data from the original column. Here is a small reproducible example:

require( data.table )
#  Seconds in the day
n <- 86400

#  Make some data
DT <- data.table( Date = as.POSIXct( Sys.time()+seq(0,2*n,by=n) , tz = "GMT") )
#                  Date
#1: 2013-08-28 21:17:10
#2: 2013-08-29 21:17:10
#3: 2013-08-30 21:17:10

#  Change the TZ
DT[ , Date2:=format( Date , tz = "Australia/Sydney")]
#                  Date               Date2
#1: 2013-08-28 21:17:10 2013-08-29 06:17:10
#2: 2013-08-29 21:17:10 2013-08-30 06:17:10
#3: 2013-08-30 21:17:10 2013-08-31 06:17:10

EDit relating to comment below

lapply is designed to be used column-wise with a data.table. To modify the column Date in-place you can do this:

DT[ , lapply( .SD , format , tz = "Australia/Sydney" ) ]

But check the meaning of .SD and .SDcols before using this on your real data.

Upvotes: 4

Related Questions