Reputation: 10163
Sometimes I try to create a title that covers the entire question, but I had some difficulty wording the question, and want to go straight into an example of what I'm trying to get done. First, a subset of my dataframe, which includes some sports data:
dput(mydf)
structure(list(team.Abbreviation = c("ATL", "BOS", "BRO", "CHA",
"CHI", "ATL", "BOS", "BRO", "CHA", "CHI", "ATL", "BOS", "BRO",
"CHA", "CHI"), stat = c("GP", "GP", "GP", "GP", "GP", "PTS",
"PTS", "PTS", "PTS", "PTS", "REB", "REB", "REB", "REB", "REB"
), value = c(28, 30, 27, 27, 27, 103.5, 103.9, 108.2, 104.7,
97.6, 47.6, 53, 54.7, 56.8, 51.7), foragainst = c("for", "for",
"for", "for", "for", "for", "for", "for", "for", "for", "for",
"for", "for", "for", "for")), .Names = c("team.Abbreviation",
"stat", "value", "foragainst"), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
mydf
# A tibble: 15 x 4
team.Abbreviation stat value foragainst
<chr> <chr> <dbl> <chr>
1 ATL GP 28.0 for
2 BOS GP 30.0 for
3 BRO GP 27.0 for
4 CHA GP 27.0 for
5 CHI GP 27.0 for
6 ATL PTS 103.5 for
7 BOS PTS 103.9 for
8 BRO PTS 108.2 for
9 CHA PTS 104.7 for
10 CHI PTS 97.6 for
11 ATL REB 47.6 for
12 BOS REB 53.0 for
13 BRO REB 54.7 for
14 CHA REB 56.8 for
15 CHI REB 51.7 for
The foragainst column can be ignored for the time being. For each stat (GP, PTS, REB in this case), I'd like to compute each team's ranking in that stat. There are 5 teams in this example. I am fairly certain that what I want is a dataframe with the same dimensions as mydf, that looks as such:
outputdf
# A tibble: 15 x 4
team.Abbreviation stat rank foragainst
<chr> <chr> <dbl> <chr>
1 ATL GP 2 for
2 BOS GP 1 for
3 BRO GP 3 for
4 CHA GP 3 for
5 CHI GP 3 for
6 ATL PTS 4 for
7 BOS PTS 3 for
8 BRO PTS 1 for
9 CHA PTS 2 for
10 CHI PTS 5 for
11 ATL REB 5 for
12 BOS REB 3 for
13 BRO REB 2 for
14 CHA REB 1 for
15 CHI REB 4 for
Examining the slice of 5 rows of this data where stat == PTS, notice that team.Abbrevation == BRO had the highest number of PTS, therefore it's rank is 1. CHI had the lowest number of PTS, therefore it's rank is 5. I don't particularly care how ties are handled, so it's not necessarily the case that the rank for BRO, CHA, and CHI must == 3 for the stat == GP.
I can probably get this done in a fairly inefficient manner with a for-loop, but I would like to find a dplyr (or other good packages) solution here. Thanks in advance!
Upvotes: 1
Views: 68
Reputation: 887028
We can use min_rank
library(dplyr)
mydf %>%
group_by(stat) %>%
mutate(rank = min_rank(-value)) %>%
select(team.Abbreviation, stat, rank, foragainst)
# A tibble: 15 x 4
# Groups: stat [3]
# team.Abbreviation stat rank foragainst
# <chr> <chr> <int> <chr>
# 1 ATL GP 2 for
# 2 BOS GP 1 for
# 3 BRO GP 3 for
# 4 CHA GP 3 for
# 5 CHI GP 3 for
# 6 ATL PTS 4 for
# 7 BOS PTS 3 for
# 8 BRO PTS 1 for
# 9 CHA PTS 2 for
#10 CHI PTS 5 for
#11 ATL REB 5 for
#12 BOS REB 3 for
#13 BRO REB 2 for
#14 CHA REB 1 for
#15 CHI REB 4 for
Or using ave
from base R
with(mydf, ave(-value, stat, FUN = function(x) rank(x, ties.method = "min")))
Upvotes: 3