Reputation: 115
I have a data frame with daily channel revenue from multiple channels. The data frame looks like the following:
orders_dataframe:
Order |Channel | Revenue |
1 |TV | 120 |
2 |Email | 30 |
3 |Retail | 300 |
4 |Shop1 | 50 |
5 |Shop2 | 90 |
6 |Email | 20 |
7 |Retail | 250 |
What I would like to do is to split those revenues coming from Retail and divide them between Shop1 and Shop2 according to a predefined ratio (e.g., 60%/40% split). For example, I would like that all rows with revenue coming from "Retail" get attributed 60% to Shop1 and 40% to Shop2. This can be reflected by replacing all retail-revenue rows with two new rows, as seen for Order 3 and Order 7 in the final table I want to get below:
orders_dataframe:
Order |Channel | Revenue |
1 |TV | 120 |
2 |Email | 30 |
3 |Shop1 | 180 |
3 |Shop2 | 120 |
4 |Shop1 | 50 |
5 |Shop2 | 90 |
6 |Email | 20 |
7 |Shop1 | 150 |
7 |Shop2 | 100 |
Ideally, since I am performing this with various datasets, I would like to take the percentages from a data frame (split_dataframe) instead of manually assigning the figures 60% and 40%. I would like to use the figures from a dataset like below:
split_dataframe:
Channel |Percent |
Shop1 |60% |
Shop2 |40% |
Here is a reproducible example of the two data frames:
orders_dataframe <- data.frame(Order = c(1,2,3,4,5,6,7),
Channel = c("TV", "Email", "Retail", "Shop1", "Shop2", "Email", "Retail"),
Revenue = c(120,30,300,50,90,20,250))
split_dataframe <- data.frame(Channel = c("Shop1", "Shop2"),
Percent = c(0.6, 0.4))
Thank you very much!
Upvotes: 1
Views: 59
Reputation: 278
You can do this in base R.
orders_dataframe <- data.frame(Order = c(1,2,3,4,5,6,7),
Channel = c("TV", "Email", "Retail", "Shop1", "Shop2", "Email", "Retail"),
Revenue = c(120,30,300,50,90,20,250))
# Coerce the channel factor to a string.
# Do you really want this as a factor?
orders_dataframe$Channel <- as.character(orders_dataframe$Channel)
# Create a vector of the replacement values.
# The prob = c() argument lets you pick the
# probabilities of each replacement.
replacement <- sample(x = c("Store1","Store2"),
size = length(which(orders_dataframe$Channel == "Retail")),
replace = TRUE, prob = c(0.6, 0.4))
# Replace the Channel columnn with the replacement vector.
orders_dataframe$Channel[which(orders_dataframe$Channel == "Retail")] <- replacement
Upvotes: 0
Reputation: 3755
With dplyr
,
split_dataframe %>%
mutate(Index="Retail") %>%
merge(.,orders_dataframe,by.x="Index",by.y="Channel") %>%
mutate(Revenue=Revenue*Percent) %>%
select(Order,Channel,Revenue) %>%
bind_rows(orders_dataframe %>% filter(Channel !="Retail"),.)%>%
arrange(.,Order)
gives,
Order Channel Revenue
1 1 TV 120
2 2 Email 30
3 3 Shop1 180
4 3 Shop2 120
5 4 Shop1 50
6 5 Shop2 90
7 6 Email 20
8 7 Shop1 150
9 7 Shop2 100
Upvotes: 1
Reputation: 27802
Here is a data.table
approach... see comments in code for explanation
library( data.table )
#make them data.tables
setDT( orders_dataframe ); setDT( split_dataframe )
#split to retail en non-retail orders
orders_retail <- orders_dataframe[ Channel == "Retail", ]
orders_no_retail <- orders_dataframe[ !Channel == "Retail", ]
#divide the retail orders over the two shops (multiple steps)
#create a new colum by shop
shop_cols <- split_dataframe$Channel
orders_retail[, (shop_cols) := Revenue ]
#melt to long format
orders_retail.melt <- melt( orders_retail,
id.vars = "Order",
measure.vars = (shop_cols),
variable.name = "Channel",
value.name = "Revenue")
#and update the molten data with the percentages in the split_dataframe
orders_retail.melt[ split_dataframe,
Revenue := Revenue * i.Percent,
on = .( Channel )]
#merge everything back together and order on Order id
ans <- rbind( orders_no_retail, orders_retail.melt )
setorder( ans, Order )
# Order Channel Revenue
# 1: 1 TV 120
# 2: 2 Email 30
# 3: 3 Shop1 180
# 4: 3 Shop2 120
# 5: 4 Shop1 50
# 6: 5 Shop2 90
# 7: 6 Email 20
# 8: 7 Shop1 150
# 9: 7 Shop2 100
Upvotes: 0