Reputation: 33
I have a dataset in R related to the effective start and end month (duration) of price. Below is an example:
print(df)
Customer Product Price Start_Month End_Month
ABC XYZ 100 Jan Jun
ABC XYZ 150 Jul Dec
I need to write code/function that can expend this table by each month. My expected result is as below:
Customer Product Price Month
ABC XYZ 100 Jan
ABC XYZ 100 Feb
ABC XYZ 100 Mar
ABC XYZ 100 Apr
ABC XYZ 100 May
ABC XYZ 100 Jun
ABC XYZ 150 Jul
ABC XYZ 150 Aug
ABC XYZ 150 Sep
ABC XYZ 150 Oct
ABC XYZ 150 Nov
ABC XYZ 150 Dec
I couldn't find other similar questions that can help me with my problem.
Upvotes: 2
Views: 167
Reputation: 388947
A tidyverse
approach would be to convert dataframe to long format so that we now have two rows for every one row in original dataframe. Now for every row we use match
to get the index of the month value in start and end month and then using that create a sequence.
library(tidyverse)
df %>%
gather(key, Month, -(1:3)) %>%
group_by_at(1:3) %>%
complete(Month = month.abb[match(Month[1], month.abb):
match(Month[2], month.abb)]) %>%
arrange(Customer, Product, Price, match(Month, month.abb)) %>%
select(-key)
# Customer Product Price Month
# <fct> <fct> <int> <chr>
# 1 ABC XYZ 100 Jan
# 2 ABC XYZ 100 Feb
# 3 ABC XYZ 100 Mar
# 4 ABC XYZ 100 Apr
# 5 ABC XYZ 100 May
# 6 ABC XYZ 100 Jun
# 7 ABC XYZ 150 Jul
# 8 ABC XYZ 150 Aug
# 9 ABC XYZ 150 Sep
#10 ABC XYZ 150 Oct
#11 ABC XYZ 150 Nov
#12 ABC XYZ 150 Dec
Or another option using map2
df %>%
mutate(Month = map2(Start_Month, End_Month,
~month.abb[match(.x, month.abb) : match(.y, month.abb)])) %>%
unnest() %>%
select(-Start_Month, -End_Month)
which in base R would be using Map
do.call(rbind, Map(function(x, y, z) cbind(df[z,],
Month = month.abb[match(x, month.abb) : match(y, month.abb)]),
df$Start_Month, df$End_Month, seq_len(nrow(df))))
Here, we exploit the inbuilt month.abb
vector to get the sequence
month.abb
# [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Upvotes: 3
Reputation: 4480
Using base r:
do.call(rbind,lapply(1:nrow(df),
function(x) {
cbind(df[x],
data.frame(Months=
month.abb[which(month.abb==df[x]$Start_Month):
which(month.abb==df[x]$End_Month)]))
}))
Upvotes: 1