Luke Steer
Luke Steer

Reputation: 117

How do I apply a function to a subset of my data while retaining the entire data frame?

I'm working with NHL player performance data, and have a data frame with the following variables (among others). war_lost is a measure of player value lost over a full season due to player injury. The data spans 9 seasons, from 2009-2010 to 2017-2018.

   first_name last_name position_new season    team    weighted_games_played war_lost
   <chr>      <chr>     <chr>        <chr>     <chr>                   <dbl>    <dbl>
 CAREY      PRICE     G            2015-2016 MTL                      48.7     6.40
 SIDNEY     CROSBY    F            2011-2012 PIT                      48.6     5.59
 SIDNEY     CROSBY    F            2010-2011 PIT                      64.8     3.88
 COREY      CRAWFORD  G            2017-2018 CHI                      47.6     3.63
 JONATHAN   QUICK     G            2016-2017 LAK                      50.1     3.30
 STEVEN     STAMKOS   F            2013-2014 TBL                      41.0     2.81
 HENRIK     LUNDQVIST G            2014-2015 NYR                      76.9     2.30
 CONNOR     MCDAVID   F            2015-2016 EDM                      45.0     2.20
 ZACH       PARISE    F            2010-2011 NJD                      46.4     1.98
 JOHN       GIBSON    G            2014-2015 ANA                      23.0     1.96
 JOHAN      FRANZEN   F            2009-2010 DET                      39.0     1.94
 VIKTOR     FASTH     G            2013-2014 ANA                      18.0     1.89
 ANTON      KHUDOBIN  G            2013-2014 CAR                      36.0     1.86
 TOMAS      HERTL     F            2013-2014 SJS                      44.0     1.84
 STEVEN     STAMKOS   F            2016-2017 TBL                      43.3     1.82
 JONAS      HILLER    G            2010-2011 ANA                      53.6     1.80
 CAM        WARD      G            2009-2010 CAR                      46.0     1.78
 PAUL       MARTIN    D            2009-2010 NJD                      27.0     1.72
 ANTTI      RAANTA    G            2017-2018 ARI/PHX                  36.6     1.62
 LUBOMIR    VISNOVSKY D            2013-2014 NYI                      54.4     1.50

If a goaltender (position_new == "G") has played fewer than 45 games on average over the previous 3 years (weighted_games_played), then I'm going to consider them a back-up goaltender, and will multiply their war_lost by coefficient x to account for the number of games they would likely play out of the games they missed due to injury.

If a goaltender has played more than 45 games on average over the previous 3 years, then I'm going to consider them a starting goaltender, and will multiply their war_lost by coefficient y to account for the number of games they would likely play out of the games they missed due to injury.

I've considered a few different methods (writing a custom function, ifelse(), a purrr method), but I'm having a hard time wrapping my heard around some of the underlying principles, chiefly how I should go about retaining all of my data while elegantly modifying the observations that are goaltenders. Perhaps something along the lines of:

data <- data %>%
    ifelse(position == "G",
           ifelse(weighted_games_played < 45, mutate(war_lost = 0.4 * war_lost), 
           mutate(war_lost = 0.6 * war_lost)),
           DO NOTHING IF NOT G)

Something along those lines? Suggestions very welcome!

Upvotes: 0

Views: 24

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28705

You can use dplyr::case_when. If your data is called df, you can use the following code

library(dplyr)
df %>% 
  mutate(war_lost = 
            case_when(position == 'G' & weighted_games_played < 45
                        ~ 0.4*war_lost,
                      position == 'G' 
                        ~ 0.6*war_lost,
                      T ~ war_lost))

Upvotes: 1

Related Questions