Reputation:
I have the sales and cost data by models. The code below select TOP10 models by sales and all the rest are sum up in new category "Others" which is 11th row.
df <- data.frame (model = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N"),
sale = c(100,300,140,456,345,456,456,780,40,560,560,456,350,500),
cost = c(1340,330,440,443,365,437,478,700,30,460,160,456,650,100))
#TOP10 by sale
order <- df %>%
type.convert(as.is = TRUE) %>%
mutate(pos = row_number(desc(sale)),
model = ifelse(pos>10, 'Others', model),
pos = ifelse(pos>10, 11, pos)) %>%
group_by(model, pos) %>%
summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
arrange(pos)
Output:
model pos cost sale
1 H 1 700 780
2 J 2 460 560
3 K 3 160 560
4 N 4 100 500
5 D 5 443 456
6 F 6 437 456
7 G 7 478 456
8 L 8 456 456
9 M 9 650 350
10 E 10 365 345
11 Others 11 2140 580
The sale of category A is put in "other" since it has low a sale (100) and is not in TOP10. Now, I want to include A in this TOP 10 in any case, no matter how much sale it has. So output should be TOP9 + 'A' + 'Others':
Expected output:
model pos cost sale
1 H 1 700 780
2 J 2 460 560
3 K 3 160 560
4 N 4 100 500
5 D 5 443 456
6 F 6 437 456
7 G 7 478 456
8 L 8 456 456
9 M 9 650 350
10 A 10 100 1340
11 Others 11 2140 580
(Thus change must be done in the given code)
Upvotes: 1
Views: 72
Reputation: 742
To be able to automate it, I created a function for you. You can easily specify the parameters and get the output quickly. Also, this function can be modified at a later stage based on your requirements:
Top10BySales = function(DataFrame,TopN=10, IncludeModels) {
## Dataframe -> User needs to specify the dataframe
## TopN -> Do you want to limit the analysis by top 10 or change it? Default value is 10
## IncludeModels -> Here you will specify which models you want to include
## Ranking the models based on sales
df1 = DataFrame %>%
arrange(desc(sale)) %>%
mutate(Ranking = 1:nrow(DataFrame),
Include = ifelse(Ranking<=TopN, model, "Other"))
## Grouping the models
df2 = df1 %>%
mutate(Ranking = ifelse(Include!="Other", Ranking, 10+1)) %>%
group_by(Include,Ranking) %>%
summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop')
## Checking the length of IncludeModelsVector
if (length(IncludeModels) == 0) {
df2=df2
} else {
df3 = data.frame(ModelNames = IncludeModels)
df3$Inclusion = ifelse(df3$ModelNames %in% df1$Include, "Yes", "No")
df3 = df3 %>% filter(Inclusion=="No")
df_original = df1 %>% filter(model %in% df3$ModelNames)
df_original$Num = 1:nrow(df_original)
for (i in 1:nrow(df_original)) {
df2[nrow(df2)-df_original[i,]$Num,] = df_original[i,c(1,4,3,2)] ## Replacing the values
}
}
return(df2 %>% arrange(Ranking))
}
To use this function, please see the picture below:
Hope this helps!
Upvotes: 1
Reputation: 346
You can mutate in 2 steps, before grouping and manipulate the pos variable to fix the order. This solves the problem in the comments to the other answer.
order <- df %>%
type.convert(as.is = TRUE) %>%
mutate(pos = row_number(desc(sale))) %>%
mutate(pos = ifelse(model == "A" & pos > 10, 11, ifelse(pos > 10, 12, pos)),
model = ifelse(pos>11, 'Others', model)) %>%
group_by(model, pos) %>%
summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
arrange(pos)
Upvotes: 0
Reputation: 78927
One way could be making use of bind_rows
after removing the 10th line and adding only where model
== A:
library(tidyverse)
#TOP10 by sale
df %>%
type.convert(as.is = TRUE) %>%
mutate(pos = row_number(desc(sale)),
model = ifelse(pos>10, 'Others', model),
pos = ifelse(pos>10, 11, pos)) %>%
group_by(model, pos) %>%
summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
arrange(pos) %>%
slice(-10) %>%
bind_rows(df %>%
filter(model == "A")) %>%
mutate(pos = replace_na(pos, 10)) %>%
arrange(pos)
model pos cost sale
<chr> <dbl> <dbl> <dbl>
1 H 1 700 780
2 J 2 460 560
3 K 3 160 560
4 N 4 100 500
5 D 5 443 456
6 F 6 437 456
7 G 7 478 456
8 L 8 456 456
9 M 9 650 350
10 A 10 1340 100
11 Others 11 2140 580
Upvotes: 0