Reputation: 16074
I have a data.table of millions of rows and one of the columns is date column. I would like to add 12 months to all the dates in that column and create a new column. So I use the dplyr and lubridate packages E.g.
library(dplyr)
library(lubridate)
new_data <- data %>% mutate(date12m = date %m+% months(12))
This works, however it is very slow for large datasets. Am I missing something? How can this be sped up? I generally don't expect R to run for more than 10 minutes for such a simple task
Edit:
I note that my solution is already more efficient than using as.yearmon. Thanks to Colonel Beauvel for the solution
a <- data.frame(date = rep(today(),1000000))
func = function(u) {
d = as.Date(as.yearmon(u)+1, frac=1)
if(day(u)>day(d)) return(d)
day(d) = day(u)
d
}
pt <- proc.time()
a <- a %>% mutate(date12m = func(date))
data.table::timetaken(pt)
pt <- proc.time()
a <- a %>% mutate(date12m = date %m+% 12)
data.table::timetaken(pt)
Upvotes: 4
Views: 9400
Reputation: 74
I am also working with big data frames in R, you can use the package DescTools
, it has a function named AddMonths(date,NoOfMonths).
It works quite well for me.
> a <- ymd("2011-09-9")
> b <- AddMonths(a,1)
> b
[1] "2011-10-09"
Upvotes: 2
Reputation: 31181
Just add 1 with month
:
x=seq.Date(from=as.Date("2007-01-01"), to=as.Date("2014-12-12"), by="day")
month(x) = month(x) + 1
#> head(x)
#[1] "2007-02-01" "2007-02-02" "2007-02-03" "2007-02-04" "2007-02-05" "2007-02-06"
Edit : as per @akrun comment here is the solution, using as.yearmon
from zoo
package. The trick is to do quick check when taking the day of the last date of the next month:
library(zoo)
func = function(u)
{
d = as.Date(as.yearmon(u)+1/12, frac=1)
if(day(u)>day(d)) return(d)
day(d) = day(u)
d
}
x=as.Date(c("2014-01-31","2015-02-28","2013-03-02"))
#> as.Date(sapply(x, func))
#[1] "2014-02-28" "2015-03-28" "2013-04-02"
Upvotes: 5