Reputation: 1037
I have a 5,000,000 x 6 data frame.
One of the columns, tweetSendTime
, is a timestamp, which I want to change to a POSIX format so I can do things like df["tweetSendTime"] > SPECIFIC_GLOBAL_VARIABLE_DATE
.
Currently, I use
foreach(j=1:len) %dopar%
{
sendTime = combinedDF[j, "tweetSendTime"]
## Current format - Thu Jan 14 19:44:46 0000 2016
sendTime = gsub(" 0000", " +0000", sendTime)
updatedTime = strptime( sendTime, "%a %b %d %H:%M:%S %z %Y")
combinedDF[j, "tweetSendTime"] = toString(updatedTime)
}
However, I am not convinced that this is the most efficient way to do this. Is there a better / faster way to update this array?
Upvotes: 0
Views: 223
Reputation: 7170
R is vectorized; you don't need to do this in a loop. In fact, the loop will slow things down dramatically. You can convert the entire column in one command (edit, per digEmAll):
combinedDF$tweetSendTime = strptime(gsub(" 0000", " +0000", combinedDF$tweetSendTime), "%a %b %d %H:%M:%S %z %Y")
Also check out as.POSIX*; that may work for you.
Upvotes: 1