Reputation: 531
I have a dataset where two teams square off in an annual game. There are two divisions where these games take place, East and West. I want to determine who the reigning champion is for a given year based on results from a previous year's game. I'd like to do this for both divisions.
Here's my dataset:
data <- data.frame(
Team = c("Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Hot Dogs",
"Hamburgers", "Pho", "Ramen", "Pho", "Ramen", "Pho", "Ramen"),
Division = c("West", "West", "West", "West", "West", "West", "East", "East",
"East", "East", "East", "East"),
Year = c("2017", "2017", "2018", "2018", "2019", "2019", "2017", "2017",
"2018", "2018", "2019", "2019"),
Score = c("37", "2", "26", "32", "37", "9", "22", "31", "25", "32", "24", "18"))
Ideally I would add a "Results" column to the original data to indicate whether or not the given team is the reigning champion going into that game. Something like this:
data$Result <- c("Initial Champion", "NA", "Champion", "NA", "NA", "Champion", "NA",
"Initial Champion", "NA", "Champion", "NA", "Champion")
Is there a straightforward way to do this using R, specifically using the tidyverse
library if possible?
I appreciate any advice. Thanks in advance.
Upvotes: 1
Views: 281
Reputation: 46948
First we get a table that has all the champions and label them as "Initial Champion" if it is the first, and others as "Champion":
library(dplyr)
X = data %>%
arrange(Year,desc(Score)) %>%
group_by(Division) %>%
filter(!duplicated(Year))%>%
mutate(result=rep(c("Initial Champion","Champion"),times=c(1,n()-1)))
# A tibble: 6 x 5
# Groups: Division [2]
Team Division Year Score result
<fct> <fct> <fct> <fct> <chr>
1 Hot Dogs West 2017 37 Initial Champion
2 Ramen East 2017 31 Initial Champion
3 Hamburgers West 2018 32 Champion
4 Ramen East 2018 32 Champion
5 Hamburgers West 2019 9 Champion
6 Pho East 2019 24 Champion
To get your final table just do:
left_join(data,X)
Upvotes: 3
Reputation: 1261
In the following answer I used dplyr in order to determine initial champion and champion where initial champion means that the team in first occurrence of data was the best in year in its division. For later years a team with highest score in its division it is considered the champion.
library(dplyr)
data <- data.frame(
Team = c("Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Hot Dogs",
"Hamburgers", "Pho", "Ramen", "Pho", "Ramen", "Pho", "Ramen"),
Division = c("West", "West", "West", "West", "West", "West", "East", "East",
"East", "East", "East", "East"),
Year = c("2017", "2017", "2018", "2018", "2019", "2019", "2017", "2017",
"2018", "2018", "2019", "2019"),
Score = c("37", "2", "26", "32", "37", "9", "22", "31", "25", "32", "24", "18"),
stringsAsFactors = FALSE)
result <-
data %>%
group_by(Year, Division) %>% # First we group by each year and division
# For each division/year we get highest score then for the team with this score
# we consider it champion
mutate(high_score = as.character(max(as.numeric(Score), na.rm = TRUE)),
result = ifelse(high_score == Score, "Champion", NA_character_)) %>%
# Now to determine the initial champion we compare it with the first year
# if the row contains data of the first year in data then it is initial
mutate(result =
ifelse(min(data$Year) == Year & result == "Champion", "Initial Champion", result)) %>%
# Here we drop high_score column because it is not needed in final output
select(-high_score)
Upvotes: 0