Identify products that make up 80% of total

Question

I found a similar question and response in Python (identify records that make up 90% of total) but couldn't quite translate it to R.

I'm trying to figure out the least number of products that make up at least 80% (this would be a variable since the % can change) of sales.

For example:

Product  Sales
A        100
B        40
C        10
D        15 
Total    165

The answer should be that I can get to 132 (80% of sales) by identifying two items. The output should look like this:

Product  Sales
A        100
B        40

Any help you can provide would be greatly appreciated!

s__ · Accepted Answer

What about a dplyr solution:
Edit:

Here a solution that seems to fit:

# your threshold
constant <- 0.5

data %>% 
# order
arrange(-Sales)%>% 
# add the cumulative
  mutate(cumulative = round(cumsum(Sales)/sum(Sales),2),
# add a threshold, the difference between the constant and the cumulative
         threshold = round(cumsum(Sales)/sum(Sales),2)- constant) %>%
# last, find all above the min value positive under the threshold
         filter(threshold <= min(.$threshold[.$threshold > 0]))

# for 0.8
  Product Sales cumulative threshold
1       A   100       0.61     -0.19
2       B    40       0.85      0.05

# for 0.5
  Product Sales cumulative threshold
1       A   100       0.61     -0.19

With data:

data <- read.table(text ="Product  Sales
A        100
B        40
C        10
D        15", header = T)

Identify products that make up 80% of total

Answers (2)

Related Questions