Rgrvkfer
Rgrvkfer

Reputation: 395

Create new columns by adding a sequence of months to a date column

I have data with a date column:

library(data.table)
library(lubridate)

Y = data.table(X = c("2012-12-31", "2021-10-31"))

            X  
1: 2012-12-31
2: 2021-10-31                    

I want to add a sequence of months, from 1 to x, to the dates in column "X". The result should be a set of new columns, "X_1", "X_2", ... "X_x":

            X          X_1          X_2    X_x
1: 2012-12-31   2013-01-31   2013-02-28 .. (2012-12-31) %m+% months(x)
2: 2021-10-31   2021-11-30   2021-12-31 .. (2021-10-31) %m+% months(x)

I've tried several things that didn't work, such as:

  Y[ , c(paste0("X_", 1:15))] = AddMonths(Y$X, c(1:15)) %>% LastDayInMonth()
  Y[ , c(paste0("X_", 1:15)) := AddMonths(as.IDate(X), c(1:15)) %>% LastDayInMonth()]

I want to create 15 columns based on this operation, so I want to avoid for loops. My real data have 80K rows.

Upvotes: 2

Views: 722

Answers (4)

Henrik
Henrik

Reputation: 67778

Both %m+% and months are vectorized.

library(data.table)
library(lubridate)

n = 3 
nm = paste0("x", "_", seq(n))
m = rep(seq(n), each = nrow(d))
d[ , (nm) := split(x %m+% months(m), m)]
d
#             x        x_1        x_2        x_3
# 1: 2012-12-31 2013-01-31 2013-02-28 2013-03-31
# 2: 2021-10-31 2021-11-30 2021-12-31 2022-01-31

Explanation:

Set the maximum number of months in the sequence to be added to each value in the date column (e.g. n = 3). Create a vector of column names to which the result should be assigned (nm = paste0("x", "_", seq(n))). Create a vector of months to be added (m = rep(seq(n), each = nrow(d))) to the date column. Add the months in 'm' to the dates in 'x' (x %m+% months(m)). Split the result by the number of months added (split(x %m+% months(m), m)) to a list. Use the fundamental property of data.table, "each element of the list becomes a column in the resulting data.table", and assign the resulting columns by reference (:=).


d = data.table(x = as.Date(c("2012-12-31","2021-10-31")))

Upvotes: 2

TarJae
TarJae

Reputation: 78917

Update: I pasted a wrong version of the answer: months(1) should be months(i-1). Corrected now.

In a for loop we can do it like this:

library(lubridate) #%m+%
Y=data.table("X"=as.Date(c("2012-12-31","2021-10-31")))

for (i in 2:15) {
  Y <- Y %>% 
    mutate(col= X %m+% months(i-1))
  colnames(Y)[i] <- paste0("X_",i-1)
}
 X        X_1        X_2        X_3        X_4        X_5        X_6        X_7        X_8        X_9       X_10       X_11       X_12
1: 2012-12-31 2013-01-31 2013-02-28 2013-03-31 2013-04-30 2013-05-31 2013-06-30 2013-07-31 2013-08-31 2013-09-30 2013-10-31 2013-11-30 2013-12-31
2: 2021-10-31 2021-11-30 2021-12-31 2022-01-31 2022-02-28 2022-03-31 2022-04-30 2022-05-31 2022-06-30 2022-07-31 2022-08-31 2022-09-30 2022-10-31
         X_13       X_14
1: 2014-01-31 2014-02-28
2: 2022-11-30 2022-12-31

Upvotes: 0

jkatam
jkatam

Reputation: 3447

Please check this code

data.table("X"=c("2012-12-31","2021-10-31")) %>% mutate(date=as.Date(X), seq=row_number(), date2=date %m+% months(15)) %>% 
  group_by(seq) %>% 
  pivot_longer(starts_with('date'), names_to = 'name', values_to = 'date') %>% 
  tidyr::complete(date=full_seq(date, period = 1)) %>% fill(X) %>% 
  mutate(year=year(date), month=month(date)) %>% group_by(X,year, month) %>% 
  slice_tail(n=1) %>% ungroup %>% group_by(X) %>%
  mutate(row=row_number()) %>% 
  pivot_wider(X,names_from = row, names_prefix = 'X',values_from = date)

Created on 2023-01-28 with reprex v2.0.2

# A tibble: 2 × 17
# Groups:   X [2]
  X          X1         X2         X3         X4         X5         X6         X7         X8         X9         X10        X11       
  <chr>      <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>    
1 2012-12-31 2012-12-31 2013-01-31 2013-02-28 2013-03-31 2013-04-30 2013-05-31 2013-06-30 2013-07-31 2013-08-31 2013-09-30 2013-10-31
2 2021-10-31 2021-10-31 2021-11-30 2021-12-31 2022-01-31 2022-02-28 2022-03-31 2022-04-30 2022-05-31 2022-06-30 2022-07-31 2022-08-31
# … with 5 more variables: X12 <date>, X13 <date>, X14 <date>, X15 <date>, X16 <date>
# ℹ Use `colnames()` to see all variable names

Upvotes: 0

MarBlo
MarBlo

Reputation: 4524

As the question is tagged with dplyr I assume tidyverse is allowed, too.

Starting with a list for the 2 dates you have.

A function taking a date as input adds from starting month as many months as choosen in the additional argument.

The function is then applied with map to the list of initial dates. This is then transformed into a data.frame.

  • Example shows only n=6
  • Contains loop
library(tidyverse)
library(lubridate)

f <- function(x, n = 5) {
  yy <- x
  for (i in seq(n)) {
    yy <- append(yy, x %m+% months(i))
  }
  return(yy)
}

l <- list(("2012-12-31"), ("2021-10-31"))
l |>
  map(lubridate::as_date) |>
  map(n = 6, f) |>
  as.data.frame() |>
  setNames(l) |>
  t() |> as.data.frame()
#>                    V1         V2         V3         V4         V5         V6
#> 2012-12-31 2012-12-31 2013-01-31 2013-02-28 2013-03-31 2013-04-30 2013-05-31
#> 2021-10-31 2021-10-31 2021-11-30 2021-12-31 2022-01-31 2022-02-28 2022-03-31
#>                    V7
#> 2012-12-31 2013-06-30
#> 2021-10-31 2022-04-30

Upvotes: 0

Related Questions