Reputation: 645
I have a tbl_df that I'm trying to create unique columns based on a variety of filters. I did some reading on using ifelse, and other mutate functions but wasn't able to figure this one out on my own. The df is named Test and is listed below.
name team salary season position
<chr> <int> <int> <int> <chr>
AP 6 15 2017 OF
AN 11 8 2016 SP
AS 1 8 2014 SP
AR 3 11 2018 SS
AB 2 10 2015 3B
AC 8 7 2017 RP
Abe 11 10 2016 OF
AM 7 12 2014 RP
Ari 11 48 2018 1B
BH 13 29 2015 OF
I'm trying to create a variety of columns based on the results of specific filters. The code I have so far is as follows:
summary <- test %>%
group_by(team, season)
The mutate columns I'm trying to create are:
I want all of these columns and results to be grouped by team, and season (as shown above)
Upvotes: 0
Views: 369
Reputation: 374
You may have struggled to work this out because it looks like you're trying to summarise your data rather than mutate it. As such, you can use ifelse()
within summarise()
– rather than mutate()
– to do this.
library(tidyverse)
test %>%
group_by(team, season) %>%
summarise(
players = n(),
hitting = sum(ifelse(!position %in% c("SP", "RP"), salary, 0)),
pitching = sum(ifelse(position %in% c("SP", "RP"), salary, 0)),
relievers = sum(ifelse(position == "RP", 1, 0)),
over_40 = sum(ifelse(salary > 40, 1, 0)),
over_40_h = sum(ifelse(salary > 40 & !position %in% c("SP", "RP"), 1, 0))
) %>%
mutate(
hitting = ifelse(season < 2016 | season > 2018, NA, hitting),
pitching = ifelse(season < 2016 | season > 2018, NA, pitching)
) %>%
arrange(team, season)
Note that:
mutate()
command is included to remove the sum of pitcher and hitter salaries for years that are not between 2016 and 2018, as you specified in the question. However, since you've grouped the data by season anyway it may be equally easy to simply ignore the salaries for years you're not interested in.dplyr
, I've changed the column names to follow the tidyverse style guide.Upvotes: 1