Canovice
Canovice

Reputation: 10163

Multiple value rankings in a tidy dataframe in R using dplyr

Sometimes I try to create a title that covers the entire question, but I had some difficulty wording the question, and want to go straight into an example of what I'm trying to get done. First, a subset of my dataframe, which includes some sports data:

dput(mydf)
structure(list(team.Abbreviation = c("ATL", "BOS", "BRO", "CHA", 
"CHI", "ATL", "BOS", "BRO", "CHA", "CHI", "ATL", "BOS", "BRO", 
"CHA", "CHI"), stat = c("GP", "GP", "GP", "GP", "GP", "PTS", 
"PTS", "PTS", "PTS", "PTS", "REB", "REB", "REB", "REB", "REB"
), value = c(28, 30, 27, 27, 27, 103.5, 103.9, 108.2, 104.7, 
97.6, 47.6, 53, 54.7, 56.8, 51.7), foragainst = c("for", "for", 
"for", "for", "for", "for", "for", "for", "for", "for", "for", 
"for", "for", "for", "for")), .Names = c("team.Abbreviation", 
"stat", "value", "foragainst"), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

mydf
# A tibble: 15 x 4
    team.Abbreviation  stat value foragainst
               <chr> <chr> <dbl>      <chr>
 1               ATL    GP  28.0        for
 2               BOS    GP  30.0        for
 3               BRO    GP  27.0        for
 4               CHA    GP  27.0        for
 5               CHI    GP  27.0        for
 6               ATL   PTS 103.5        for
 7               BOS   PTS 103.9        for
 8               BRO   PTS 108.2        for
 9               CHA   PTS 104.7        for
10               CHI   PTS  97.6        for
11               ATL   REB  47.6        for
12               BOS   REB  53.0        for
13               BRO   REB  54.7        for
14               CHA   REB  56.8        for
15               CHI   REB  51.7        for

The foragainst column can be ignored for the time being. For each stat (GP, PTS, REB in this case), I'd like to compute each team's ranking in that stat. There are 5 teams in this example. I am fairly certain that what I want is a dataframe with the same dimensions as mydf, that looks as such:

outputdf
# A tibble: 15 x 4
    team.Abbreviation  stat rank  foragainst
               <chr> <chr> <dbl>      <chr>
 1               ATL    GP     2        for
 2               BOS    GP     1        for
 3               BRO    GP     3        for
 4               CHA    GP     3        for
 5               CHI    GP     3        for
 6               ATL   PTS     4        for
 7               BOS   PTS     3        for
 8               BRO   PTS     1        for
 9               CHA   PTS     2        for
10               CHI   PTS     5        for
11               ATL   REB     5        for
12               BOS   REB     3        for
13               BRO   REB     2        for
14               CHA   REB     1        for
15               CHI   REB     4        for

Examining the slice of 5 rows of this data where stat == PTS, notice that team.Abbrevation == BRO had the highest number of PTS, therefore it's rank is 1. CHI had the lowest number of PTS, therefore it's rank is 5. I don't particularly care how ties are handled, so it's not necessarily the case that the rank for BRO, CHA, and CHI must == 3 for the stat == GP.

I can probably get this done in a fairly inefficient manner with a for-loop, but I would like to find a dplyr (or other good packages) solution here. Thanks in advance!

Upvotes: 1

Views: 68

Answers (1)

akrun
akrun

Reputation: 887028

We can use min_rank

library(dplyr)
mydf %>% 
    group_by(stat) %>% 
    mutate(rank = min_rank(-value)) %>% 
    select(team.Abbreviation, stat, rank, foragainst)
# A tibble: 15 x 4
# Groups:   stat [3]
#   team.Abbreviation  stat  rank foragainst
#               <chr> <chr> <int>      <chr>
# 1               ATL    GP     2        for
# 2               BOS    GP     1        for
# 3               BRO    GP     3        for
# 4               CHA    GP     3        for
# 5               CHI    GP     3        for
# 6               ATL   PTS     4        for
# 7               BOS   PTS     3        for
# 8               BRO   PTS     1        for
# 9               CHA   PTS     2        for
#10               CHI   PTS     5        for
#11               ATL   REB     5        for
#12               BOS   REB     3        for
#13               BRO   REB     2        for
#14               CHA   REB     1        for
#15               CHI   REB     4        for

Or using ave from base R

with(mydf, ave(-value, stat, FUN = function(x) rank(x, ties.method = "min")))

Upvotes: 3

Related Questions