Reputation: 47
I found a similar question and response in Python (identify records that make up 90% of total) but couldn't quite translate it to R.
I'm trying to figure out the least number of products that make up at least 80% (this would be a variable since the % can change) of sales.
For example:
Product Sales
A 100
B 40
C 10
D 15
Total 165
The answer should be that I can get to 132 (80% of sales) by identifying two items. The output should look like this:
Product Sales
A 100
B 40
Any help you can provide would be greatly appreciated!
Upvotes: 1
Views: 276
Reputation: 16
s_t answer is simple and efficient but if you are looking for a base R solution and a function:
example <- data.frame(Product = c("A", "B", "C", "D"), Sales = c(100, 40, 10, 15))
min.products <- function(Product, Sales, percent){
df <- data.frame(Product, Sales)
minimum <- percent*sum(df$Sales)
df <- df[order(-df$Sales), ]
lowest.score <- df$Sales[cumsum(df$Sales)>minimum][1]
answer <- df$Product[df$Sales>=lowest.score]
return(answer)
}
min.products(example$Product, example$Sales, 0.8)
Upvotes: 0
Reputation: 9485
What about a dplyr
solution:
Edit:
Here a solution that seems to fit:
# your threshold
constant <- 0.5
data %>%
# order
arrange(-Sales)%>%
# add the cumulative
mutate(cumulative = round(cumsum(Sales)/sum(Sales),2),
# add a threshold, the difference between the constant and the cumulative
threshold = round(cumsum(Sales)/sum(Sales),2)- constant) %>%
# last, find all above the min value positive under the threshold
filter(threshold <= min(.$threshold[.$threshold > 0]))
# for 0.8
Product Sales cumulative threshold
1 A 100 0.61 -0.19
2 B 40 0.85 0.05
# for 0.5
Product Sales cumulative threshold
1 A 100 0.61 -0.19
With data:
data <- read.table(text ="Product Sales
A 100
B 40
C 10
D 15", header = T)
Upvotes: 1