user18937485
user18937485

Reputation:

How to select customized TOP10 rows by variable?

I have the sales and cost data by models. The code below select TOP10 models by sales and all the rest are sum up in new category "Others" which is 11th row.

df <- data.frame (model  = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N"),
                 sale = c(100,300,140,456,345,456,456,780,40,560,560,456,350,500),
                 cost = c(1340,330,440,443,365,437,478,700,30,460,160,456,650,100))


#TOP10 by sale
order <- df %>%
  type.convert(as.is = TRUE) %>%
  mutate(pos = row_number(desc(sale)),
         model = ifelse(pos>10, 'Others', model),
         pos = ifelse(pos>10, 11, pos)) %>%
  group_by(model, pos) %>%
  summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
  arrange(pos)

Output:

   model    pos  cost  sale
 1 H          1   700   780
 2 J          2   460   560
 3 K          3   160   560
 4 N          4   100   500
 5 D          5   443   456
 6 F          6   437   456
 7 G          7   478   456
 8 L          8   456   456
 9 M          9   650   350
10 E         10   365   345
11 Others    11  2140   580

The sale of category A is put in "other" since it has low a sale (100) and is not in TOP10. Now, I want to include A in this TOP 10 in any case, no matter how much sale it has. So output should be TOP9 + 'A' + 'Others':

Expected output:

   model    pos  cost  sale
 1 H          1   700   780
 2 J          2   460   560
 3 K          3   160   560
 4 N          4   100   500
 5 D          5   443   456
 6 F          6   437   456
 7 G          7   478   456
 8 L          8   456   456
 9 M          9   650   350
10 A         10   100   1340
11 Others    11  2140   580

(Thus change must be done in the given code)

Upvotes: 1

Views: 72

Answers (3)

Deepansh Arora
Deepansh Arora

Reputation: 742

To be able to automate it, I created a function for you. You can easily specify the parameters and get the output quickly. Also, this function can be modified at a later stage based on your requirements:

Top10BySales = function(DataFrame,TopN=10, IncludeModels) {
## Dataframe -> User needs to specify the dataframe 
## TopN -> Do you want to limit the analysis by top 10 or change it? Default value is 10
## IncludeModels -> Here you will specify which models you want to include

## Ranking the models based on sales
  df1 = DataFrame %>%
    arrange(desc(sale)) %>%
    mutate(Ranking = 1:nrow(DataFrame),
           Include = ifelse(Ranking<=TopN, model, "Other")) 

## Grouping the models  
  df2 = df1 %>%
    mutate(Ranking = ifelse(Include!="Other", Ranking, 10+1)) %>%
    group_by(Include,Ranking) %>%
    summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') 
  
## Checking the length of IncludeModelsVector
  if (length(IncludeModels) == 0) {
    df2=df2
  } else {
    df3 = data.frame(ModelNames = IncludeModels)
    df3$Inclusion = ifelse(df3$ModelNames %in% df1$Include, "Yes", "No")
    df3 = df3 %>% filter(Inclusion=="No")
    df_original = df1 %>% filter(model %in% df3$ModelNames)
    df_original$Num = 1:nrow(df_original)
    for (i in 1:nrow(df_original)) {
      df2[nrow(df2)-df_original[i,]$Num,] = df_original[i,c(1,4,3,2)] ## Replacing the values
    }
  }
  
  return(df2 %>% arrange(Ranking))
  
  }

Using the function

To use this function, please see the picture below:

enter image description here

Hope this helps!

Upvotes: 1

akshaymoorthy
akshaymoorthy

Reputation: 346

You can mutate in 2 steps, before grouping and manipulate the pos variable to fix the order. This solves the problem in the comments to the other answer.

order <- df %>%
  type.convert(as.is = TRUE) %>%
  mutate(pos = row_number(desc(sale))) %>% 
  mutate(pos = ifelse(model == "A" & pos > 10, 11, ifelse(pos > 10, 12, pos)),
         model = ifelse(pos>11, 'Others', model)) %>%
  group_by(model, pos) %>%
  summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
  arrange(pos)

Upvotes: 0

TarJae
TarJae

Reputation: 78927

One way could be making use of bind_rows after removing the 10th line and adding only where model == A:

library(tidyverse)
#TOP10 by sale
 df %>%
   type.convert(as.is = TRUE) %>%
   mutate(pos = row_number(desc(sale)),
         model = ifelse(pos>10, 'Others', model),
         pos = ifelse(pos>10, 11, pos)) %>%
   group_by(model, pos) %>%
   summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
   arrange(pos) %>% 
   slice(-10) %>% 
   bind_rows(df %>% 
               filter(model == "A")) %>% 
   mutate(pos = replace_na(pos, 10)) %>% 
   arrange(pos)
   model    pos  cost  sale
   <chr>  <dbl> <dbl> <dbl>
 1 H          1   700   780
 2 J          2   460   560
 3 K          3   160   560
 4 N          4   100   500
 5 D          5   443   456
 6 F          6   437   456
 7 G          7   478   456
 8 L          8   456   456
 9 M          9   650   350
10 A         10  1340   100
11 Others    11  2140   580

Upvotes: 0

Related Questions