Nik
Nik

Reputation: 51

create an incremental sequence for Duplicates by count

I have a data frame

library(data.table)
dt <- data.table(Customer = c("a", "a", "c"), 
  count = c(3, 4, 5), 
  Date = c("2019-02-01","2019-05-01","2019-10-01"))
  Customer count       Date
1:        a     3 2019-02-01
2:        a     4 2019-05-01
3:        c     5 2019-10-01

I am replicating this data frame by count using:

dt[rep(seq(1, nrow(dt)), dt$count)]
    Customer count       Date
 1:        a     3 2019-02-01
 2:        a     3 2019-02-01
 3:        a     3 2019-02-01
 4:        a     4 2019-05-01
 5:        a     4 2019-05-01
 6:        a     4 2019-05-01
 7:        a     4 2019-05-01
 8:        c     5 2019-10-01
 9:        c     5 2019-10-01
10:        c     5 2019-10-01
11:        c     5 2019-10-01
12:        c     5 2019-10-01

And further using months to increment the month field

dt %>%
  group_by(Customer) %>%
  mutate(Date = as.Date(Date[1]) + months(seq(0, length.out = n(), by = 3)))
  Customer count Date      
   <chr>    <dbl> <date>    
 1 a            3 2019-02-01
 2 a            3 2019-05-01
 3 a            3 2019-08-01
 4 a            4 2019-11-01
 5 a            4 2020-02-01
 6 a            4 2020-05-01
 7 a            4 2020-08-01
 8 c            5 2019-10-01
 9 c            5 2020-01-01
10 c            5 2020-04-01
11 c            5 2020-07-01
12 c            5 2020-10-01

However the date in line 4 is an increment of 3rd line (since its grouped by customer and in this case the customer names are same). I am looking to get an output that starts on "2019-05-01" which is the original date against the original data frame. Therefore, I'm hoping to get something like this:

  Customer count Date      
   <chr>    <dbl> <date>    
 1 a            3 2019-02-01
 2 a            3 2019-05-01
 3 a            3 2019-08-01
 4 a            4 2019-05-01
 5 a            4 2019-08-01
 6 a            4 2019-11-01
 7 a            4 2020-02-01
 8 c            5 2019-10-01
 9 c            5 2020-01-01
10 c            5 2020-04-01
11 c            5 2020-07-01
12 c            5 2020-10-01

Upvotes: 1

Views: 64

Answers (1)

Frank
Frank

Reputation: 66819

I think adding a row number and grouping on it should work:

dt %>% mutate(Date = as.Date(Date), rn = row_number()) %>% 
  slice(rep(row_number(), count)) %>%
  group_by(Customer, rn) %>%
  mutate(Date = seq(first(Date), by="3 months", length.out=n()))

# A tibble: 12 x 4
# Groups:   Customer, rn [3]
   Customer count Date          rn
   <chr>    <dbl> <date>     <int>
 1 a            3 2019-02-01     1
 2 a            3 2019-05-01     1
 3 a            3 2019-08-01     1
 4 a            4 2019-05-01     2
 5 a            4 2019-08-01     2
 6 a            4 2019-11-01     2
 7 a            4 2020-02-01     2
 8 c            5 2019-10-01     3
 9 c            5 2020-01-01     3
10 c            5 2020-04-01     3
11 c            5 2020-07-01     3
12 c            5 2020-10-01     3

Fwiw, here is the data.table analogue:

dt[, Date := as.IDate(Date)][, rn := .I]
res <- dt[rep(1:.N, count)][, 
  Date := seq(first(Date), by="3 months", length.out=.N)
, by=.(Customer, rn)][]

    Customer count       Date rn
 1:        a     3 2019-02-01  1
 2:        a     3 2019-05-01  1
 3:        a     3 2019-08-01  1
 4:        a     4 2019-05-01  2
 5:        a     4 2019-08-01  2
 6:        a     4 2019-11-01  2
 7:        a     4 2020-02-01  2
 8:        c     5 2019-10-01  3
 9:        c     5 2020-01-01  3
10:        c     5 2020-04-01  3
11:        c     5 2020-07-01  3
12:        c     5 2020-10-01  3

or more concisely (not possible with normal dplyr verbs):

dt[, Date := as.IDate(Date)][, rn := .I]
dt[, 
  .(Customer, count, Date = seq(first(Date), length.out=count, by="3 months"))
, by=.(rn)]

Upvotes: 1

Related Questions