LBZR
LBZR

Reputation: 181

Sort dataframe by custom order

I have a dataframe that contains a rating column.

I would like to sort using a custom order, e.g. AAA, AA, A, BBB, BB, B

However, default R sorting (using dplyr::arrange) results in A AA AAA B BB BBB

data.frame(Rating=c('AAA','AA','A','B','BB','BBB'),
           Value1=c(1,2,3,4,5,6),
           Value2=c(2,3,4,5,3,2)) %>% 
           arrange(Rating)

I found many links referring to same problem but they are not related to dataframe eg customize the sort function in R

How can I sort my data using a custom order?

Upvotes: 1

Views: 846

Answers (3)

IceCreamToucan
IceCreamToucan

Reputation: 28685

If you want to keep the column as character rather than factor, you can arrange based on a match to the order vector

rating_order <- c("AAA", "AA", "A", "BBB", "BB", "B")

df <-
  data.frame(
    Rating = c("A", "AA", "AAA", "B", "BB", "BBB"),
    Value1 = c(1, 2, 3, 4, 5, 6),
    Value2 = c(2, 3, 4, 5, 3, 2)
  )

library(dplyr, warn.conflicts = FALSE)

df %>%
  arrange(match(Rating, rating_order))
#>   Rating Value1 Value2
#> 1    AAA      3      4
#> 2     AA      2      3
#> 3      A      1      2
#> 4    BBB      6      2
#> 5     BB      5      3
#> 6      B      4      5

Created on 2022-01-20 by the reprex package (v2.0.1)

Upvotes: 2

user2554330
user2554330

Reputation: 44867

If you know the complete list of possible ratings, the easiest way is to make it a factor with the values in order, e.g.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data.frame(Rating=factor(c('A', 'AA','AAA','B','BB','BBB'),
                         levels=c('AAA','AA','A','BBB','BB','B')),
       Value1=c(1,2,3,4,5,6),
       Value2=c(2,3,4,5,3,2)) %>% 
       arrange(Rating)
#>   Rating Value1 Value2
#> 1    AAA      3      4
#> 2     AA      2      3
#> 3      A      1      2
#> 4    BBB      6      2
#> 5     BB      5      3
#> 6      B      4      5

Created on 2022-01-20 by the reprex package (v2.0.1)

Upvotes: 1

mhovd
mhovd

Reputation: 4067

Here is one approach using dplyr. In short, first sort by the letter grade, and then again by the number of letters. This would not work for ratings such as AAB, but from what I gather from your example this isn't the case.

library(dplyr)

data.frame(Rating=c('AAA','AA','A','B','BB','BBB'),
           Value1=c(1,2,3,4,5,6),
           Value2=c(2,3,4,5,3,2)) %>%
  mutate(grade = substr(Rating, 1,1), # Create a column with letter grade
         count = nchar(Rating)) %>%   # Create a column with number of letters
  arrange(grade, count) %>%           # Sort by grade, then count
  select(-grade, count)               # Optional, removes intermediary columns

#>   Rating Value1 Value2 count
#> 1      A      3      4     1
#> 2     AA      2      3     2
#> 3    AAA      1      2     3
#> 4      B      4      5     1
#> 5     BB      5      3     2
#> 6    BBB      6      2     3

Created on 2022-01-20 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions