Reputation: 117
I'm working with NHL player performance data, and have a data frame with the following variables (among others). war_lost is a measure of player value lost over a full season due to player injury. The data spans 9 seasons, from 2009-2010 to 2017-2018.
first_name last_name position_new season team weighted_games_played war_lost
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
CAREY PRICE G 2015-2016 MTL 48.7 6.40
SIDNEY CROSBY F 2011-2012 PIT 48.6 5.59
SIDNEY CROSBY F 2010-2011 PIT 64.8 3.88
COREY CRAWFORD G 2017-2018 CHI 47.6 3.63
JONATHAN QUICK G 2016-2017 LAK 50.1 3.30
STEVEN STAMKOS F 2013-2014 TBL 41.0 2.81
HENRIK LUNDQVIST G 2014-2015 NYR 76.9 2.30
CONNOR MCDAVID F 2015-2016 EDM 45.0 2.20
ZACH PARISE F 2010-2011 NJD 46.4 1.98
JOHN GIBSON G 2014-2015 ANA 23.0 1.96
JOHAN FRANZEN F 2009-2010 DET 39.0 1.94
VIKTOR FASTH G 2013-2014 ANA 18.0 1.89
ANTON KHUDOBIN G 2013-2014 CAR 36.0 1.86
TOMAS HERTL F 2013-2014 SJS 44.0 1.84
STEVEN STAMKOS F 2016-2017 TBL 43.3 1.82
JONAS HILLER G 2010-2011 ANA 53.6 1.80
CAM WARD G 2009-2010 CAR 46.0 1.78
PAUL MARTIN D 2009-2010 NJD 27.0 1.72
ANTTI RAANTA G 2017-2018 ARI/PHX 36.6 1.62
LUBOMIR VISNOVSKY D 2013-2014 NYI 54.4 1.50
If a goaltender (position_new == "G")
has played fewer than 45 games on average over the previous 3 years (weighted_games_played)
, then I'm going to consider them a back-up goaltender, and will multiply their war_lost by coefficient x to account for the number of games they would likely play out of the games they missed due to injury.
If a goaltender has played more than 45 games on average over the previous 3 years, then I'm going to consider them a starting goaltender, and will multiply their war_lost by coefficient y to account for the number of games they would likely play out of the games they missed due to injury.
I've considered a few different methods (writing a custom function, ifelse(), a purrr method), but I'm having a hard time wrapping my heard around some of the underlying principles, chiefly how I should go about retaining all of my data while elegantly modifying the observations that are goaltenders. Perhaps something along the lines of:
data <- data %>%
ifelse(position == "G",
ifelse(weighted_games_played < 45, mutate(war_lost = 0.4 * war_lost),
mutate(war_lost = 0.6 * war_lost)),
DO NOTHING IF NOT G)
Something along those lines? Suggestions very welcome!
Upvotes: 0
Views: 24
Reputation: 28705
You can use dplyr::case_when
. If your data is called df
, you can use the following code
library(dplyr)
df %>%
mutate(war_lost =
case_when(position == 'G' & weighted_games_played < 45
~ 0.4*war_lost,
position == 'G'
~ 0.6*war_lost,
T ~ war_lost))
Upvotes: 1