Reputation: 51
I have a data frame
library(data.table)
dt <- data.table(Customer = c("a", "a", "c"),
count = c(3, 4, 5),
Date = c("2019-02-01","2019-05-01","2019-10-01"))
Customer count Date
1: a 3 2019-02-01
2: a 4 2019-05-01
3: c 5 2019-10-01
I am replicating this data frame by count using:
dt[rep(seq(1, nrow(dt)), dt$count)]
Customer count Date
1: a 3 2019-02-01
2: a 3 2019-02-01
3: a 3 2019-02-01
4: a 4 2019-05-01
5: a 4 2019-05-01
6: a 4 2019-05-01
7: a 4 2019-05-01
8: c 5 2019-10-01
9: c 5 2019-10-01
10: c 5 2019-10-01
11: c 5 2019-10-01
12: c 5 2019-10-01
And further using months
to increment the month field
dt %>%
group_by(Customer) %>%
mutate(Date = as.Date(Date[1]) + months(seq(0, length.out = n(), by = 3)))
Customer count Date
<chr> <dbl> <date>
1 a 3 2019-02-01
2 a 3 2019-05-01
3 a 3 2019-08-01
4 a 4 2019-11-01
5 a 4 2020-02-01
6 a 4 2020-05-01
7 a 4 2020-08-01
8 c 5 2019-10-01
9 c 5 2020-01-01
10 c 5 2020-04-01
11 c 5 2020-07-01
12 c 5 2020-10-01
However the date in line 4 is an increment of 3rd line (since its grouped by customer and in this case the customer names are same). I am looking to get an output that starts on "2019-05-01" which is the original date against the original data frame. Therefore, I'm hoping to get something like this:
Customer count Date
<chr> <dbl> <date>
1 a 3 2019-02-01
2 a 3 2019-05-01
3 a 3 2019-08-01
4 a 4 2019-05-01
5 a 4 2019-08-01
6 a 4 2019-11-01
7 a 4 2020-02-01
8 c 5 2019-10-01
9 c 5 2020-01-01
10 c 5 2020-04-01
11 c 5 2020-07-01
12 c 5 2020-10-01
Upvotes: 1
Views: 64
Reputation: 66819
I think adding a row number and grouping on it should work:
dt %>% mutate(Date = as.Date(Date), rn = row_number()) %>%
slice(rep(row_number(), count)) %>%
group_by(Customer, rn) %>%
mutate(Date = seq(first(Date), by="3 months", length.out=n()))
# A tibble: 12 x 4
# Groups: Customer, rn [3]
Customer count Date rn
<chr> <dbl> <date> <int>
1 a 3 2019-02-01 1
2 a 3 2019-05-01 1
3 a 3 2019-08-01 1
4 a 4 2019-05-01 2
5 a 4 2019-08-01 2
6 a 4 2019-11-01 2
7 a 4 2020-02-01 2
8 c 5 2019-10-01 3
9 c 5 2020-01-01 3
10 c 5 2020-04-01 3
11 c 5 2020-07-01 3
12 c 5 2020-10-01 3
Fwiw, here is the data.table analogue:
dt[, Date := as.IDate(Date)][, rn := .I]
res <- dt[rep(1:.N, count)][,
Date := seq(first(Date), by="3 months", length.out=.N)
, by=.(Customer, rn)][]
Customer count Date rn
1: a 3 2019-02-01 1
2: a 3 2019-05-01 1
3: a 3 2019-08-01 1
4: a 4 2019-05-01 2
5: a 4 2019-08-01 2
6: a 4 2019-11-01 2
7: a 4 2020-02-01 2
8: c 5 2019-10-01 3
9: c 5 2020-01-01 3
10: c 5 2020-04-01 3
11: c 5 2020-07-01 3
12: c 5 2020-10-01 3
or more concisely (not possible with normal dplyr verbs):
dt[, Date := as.IDate(Date)][, rn := .I]
dt[,
.(Customer, count, Date = seq(first(Date), length.out=count, by="3 months"))
, by=.(rn)]
Upvotes: 1