Jibril
Jibril

Reputation: 1037

R - Fastest / Most Efficient way to convert data of a column in a data frame?

I have a 5,000,000 x 6 data frame.

One of the columns, tweetSendTime, is a timestamp, which I want to change to a POSIX format so I can do things like df["tweetSendTime"] > SPECIFIC_GLOBAL_VARIABLE_DATE.

Currently, I use

foreach(j=1:len) %dopar%
{
    sendTime = combinedDF[j, "tweetSendTime"]
    ## Current format - Thu Jan 14 19:44:46  0000 2016
    sendTime = gsub(" 0000", " +0000", sendTime)
    updatedTime = strptime( sendTime, "%a %b %d %H:%M:%S %z %Y")
    combinedDF[j, "tweetSendTime"] = toString(updatedTime)
}

However, I am not convinced that this is the most efficient way to do this. Is there a better / faster way to update this array?

Upvotes: 0

Views: 223

Answers (1)

TayTay
TayTay

Reputation: 7170

R is vectorized; you don't need to do this in a loop. In fact, the loop will slow things down dramatically. You can convert the entire column in one command (edit, per digEmAll):

combinedDF$tweetSendTime = strptime(gsub(" 0000", " +0000", combinedDF$tweetSendTime), "%a %b %d %H:%M:%S %z %Y")

Also check out as.POSIX*; that may work for you.

Upvotes: 1

Related Questions