Iterative subtraction, over many columns, based on condition within R

Question

My data contains statistics on the outcome of a soccer game, with 12806 observations (match outcomes) and 34 key performance indicators.

A (small) example of my data.frame is below:

head(Test)
  MatchID Outcome Var1 Var2 Var3 Var4 Var5
1      30    Loss    0   10    0   10    0
2      30     Win    6   13    6   13    6
3      31    Loss    8   12    3    6    3
4      31     Win   29   40    9   19    3
5      32    Loss    7   26    7   26    6
6      32     Win   11   20   11   20    9

For every unique "Match ID" I wish to deduct each of the losing (Outcome=="Loss" key performance indicators from the winning (Outcome=="Win") team. My data.set is not always arranged by Loss, Win, Loss, Win so completing this in a row.wise fashion may not be possible.

I have tried the following using dplyr:

 Differences <- Test %>% 
   group_by(MatchID) %>% 
   summarise_at( .vars = names(.)[3:7], ((Outcome == "Win") - (Outcome == "Loss")))

but fear I am using the wrong approach as I received the following error: Error in inherits(x, "fun_list") : object 'Outcome' not found

My anticipated outcome would be:

head(AnticipatedOutcome)
  MatchID Var1 Var2 Var3 Var4 Var5
1      30  6    3    6    3    6
3      31 21   28    6   13    0
5      32  4   -6    4   -6    3

Is this please possible to achieve, using dplyr?

akrun · Accepted Answer

The difference of two logical vectors is of the same length. We need to subset the 'Var' columns where 'Outcome' is "Win", take the sum of it and subtract it from those where the 'Outcome' is "Loss"

library(tidyverse)
Test %>%
    group_by(MatchID) %>%
    summarise_at(vars(starts_with('Var')),
              funs(sum(.[Outcome == "Win"]) - sum(.[Outcome == "Loss"])))
# A tibble: 3 x 6
#  MatchID  Var1  Var2  Var3  Var4  Var5
#         
#1      30     6     3     6     3     6
#2      31    21    28     6    13     0
#3      32     4    -6     4    -6     3

Or another option would be to gather into 'long' format, get the group by difference of sum and spread it to 'wide' format

Test %>% 
  gather(key, val, Var1:Var5) %>% 
  group_by(MatchID, key) %>%
  summarise(val = sum(val[Outcome == "Win"]) - sum(val[Outcome == "Loss"])) %>%
  spread(key, val)

Iterative subtraction, over many columns, based on condition within R

Answers (2)

Related Questions