DSGym
DSGym

Reputation: 2867

R dplyr - arrange row order dynamically

df <- data.frame(
    company = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "o", "p"),
    EUR = c(1000, 700, 200, 90, 120, 200, 90, 150, 120, 210, 100, 120, 200, 50, 70)
) 

df <- df %>%
    mutate(company = as.character(company)) %>%
    mutate(company = ifelse(row_number() > 10, "others", company)) %>%
    mutate(company = as.factor(company)) %>%
    group_by(company) %>%
    summarise(EUR = sum(EUR, na.rm = TRUE)) %>%
    arrange(desc(EUR))
df

# A tibble: 11 x 2
   company   EUR
   <fct>   <dbl>
 1 a        1000
 2 b         700
 3 others    540
 4 j         210
 5 c         200
 6 f         200
 7 h         150
 8 e         120
 9 i         120
10 d          90
11 g          90

I have this pretty common task. I want to get the top10 company by spendings and summarise the other companies to "other". I know there is an option to manually reorder the rows by changing it to a factor variable and then reordering the levels, but this doesn't work because others can always be in a different place and I have to do this operation for many markets in many different countries. So "Others" should always be at the last position, no matter which line the category is in. How do I do that?

Upvotes: 3

Views: 263

Answers (2)

tmfmnk
tmfmnk

Reputation: 40171

You can also try:

df %>%
 arrange(company == "others", desc(EUR))

   company   EUR
   <fct>   <dbl>
 1 a        1000
 2 b         700
 3 j         210
 4 c         200
 5 f         200
 6 h         150
 7 e         120
 8 i         120
 9 d          90
10 g          90
11 others    540

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 389325

You could find out the row number of "others" and rearrange using slice

library(dplyr)

df %>%
  slice({i <- which.max(company == "others"); c(setdiff(seq_len(n()), i), i)})

#   company   EUR
#   <fct>   <dbl>
# 1 a        1000
# 2 b         700
# 3 j         210
# 4 c         200
# 5 f         200
# 6 h         150
# 7 e         120
# 8 i         120
# 9 d          90
#10 g          90
#11 others    540

Same logic in base R would be

i <- which.max(df$company == "others")
df[c(setdiff(seq_len(nrow(df)), i), i), ]

Upvotes: 1

Related Questions