user2716568
user2716568

Reputation: 1946

Iterative subtraction, over many columns, based on condition within R

My data contains statistics on the outcome of a soccer game, with 12806 observations (match outcomes) and 34 key performance indicators.

A (small) example of my data.frame is below:

head(Test)
  MatchID Outcome Var1 Var2 Var3 Var4 Var5
1      30    Loss    0   10    0   10    0
2      30     Win    6   13    6   13    6
3      31    Loss    8   12    3    6    3
4      31     Win   29   40    9   19    3
5      32    Loss    7   26    7   26    6
6      32     Win   11   20   11   20    9

For every unique "Match ID" I wish to deduct each of the losing (Outcome=="Loss" key performance indicators from the winning (Outcome=="Win") team. My data.set is not always arranged by Loss, Win, Loss, Win so completing this in a row.wise fashion may not be possible.

I have tried the following using dplyr:

 Differences <- Test %>% 
   group_by(MatchID) %>% 
   summarise_at( .vars = names(.)[3:7], ((Outcome == "Win") - (Outcome == "Loss")))

but fear I am using the wrong approach as I received the following error: Error in inherits(x, "fun_list") : object 'Outcome' not found

My anticipated outcome would be:

head(AnticipatedOutcome)
  MatchID Var1 Var2 Var3 Var4 Var5
1      30  6    3    6    3    6
3      31 21   28    6   13    0
5      32  4   -6    4   -6    3

Is this please possible to achieve, using dplyr?

Upvotes: 2

Views: 156

Answers (2)

MKR
MKR

Reputation: 20095

One can use data.table with .SDcols argument to summarise the data. As @akrun has mentioned in his solution, sum of "Loss" will be subtracted from the sum of "Win" for each Match.

library(data.table)

setDT(df)

df[,lapply(.SD,function(x)sum(x[Outcome=="Win"]) - sum(x[Outcome=="Loss"])), 
   .SDcols = Var1:Var5,by=MatchID]

#    MatchID Var1 Var2 Var3 Var4 Var5
# 1:      30    6    3    6    3    6
# 2:      31   21   28    6   13    0
# 3:      32    4   -6    4   -6    3

Note: Just for the sake of exploring different ideas, but one can achieve same result in even base-R:

cbind(unique(df[1]), df[order(df$MatchID),][df$Outcome == "Win",3:7] - 
        df[order(df$MatchID),][df$Outcome == "Loss",3:7])

#   MatchID Var1 Var2 Var3 Var4 Var5
# 1      30    6    3    6    3    6
# 3      31   21   28    6   13    0
# 5      32    4   -6    4   -6    3

Data:

df <- read.table(text =
"MatchID Outcome Var1 Var2 Var3 Var4 Var5
1      30    Loss    0   10    0   10    0
2      30     Win    6   13    6   13    6
3      31    Loss    8   12    3    6    3
4      31     Win   29   40    9   19    3
5      32    Loss    7   26    7   26    6
6      32     Win   11   20   11   20    9",
header =TRUE, stringsAsFactors = FALSE)

Upvotes: 1

akrun
akrun

Reputation: 887391

The difference of two logical vectors is of the same length. We need to subset the 'Var' columns where 'Outcome' is "Win", take the sum of it and subtract it from those where the 'Outcome' is "Loss"

library(tidyverse)
Test %>%
    group_by(MatchID) %>%
    summarise_at(vars(starts_with('Var')),
              funs(sum(.[Outcome == "Win"]) - sum(.[Outcome == "Loss"])))
# A tibble: 3 x 6
#  MatchID  Var1  Var2  Var3  Var4  Var5
#    <int> <int> <int> <int> <int> <int>
#1      30     6     3     6     3     6
#2      31    21    28     6    13     0
#3      32     4    -6     4    -6     3

Or another option would be to gather into 'long' format, get the group by difference of sum and spread it to 'wide' format

Test %>% 
  gather(key, val, Var1:Var5) %>% 
  group_by(MatchID, key) %>%
  summarise(val = sum(val[Outcome == "Win"]) - sum(val[Outcome == "Loss"])) %>%
  spread(key, val)

Upvotes: 4

Related Questions