Reputation: 1946
My data contains statistics on the outcome of a soccer game, with 12806 observations (match outcomes) and 34 key performance indicators.
A (small) example of my data.frame is below:
head(Test)
MatchID Outcome Var1 Var2 Var3 Var4 Var5
1 30 Loss 0 10 0 10 0
2 30 Win 6 13 6 13 6
3 31 Loss 8 12 3 6 3
4 31 Win 29 40 9 19 3
5 32 Loss 7 26 7 26 6
6 32 Win 11 20 11 20 9
For every unique "Match ID" I wish to deduct each of the losing (Outcome=="Loss" key performance indicators from the winning (Outcome=="Win") team. My data.set is not always arranged by Loss, Win, Loss, Win so completing this in a row.wise fashion may not be possible.
I have tried the following using dplyr:
Differences <- Test %>%
group_by(MatchID) %>%
summarise_at( .vars = names(.)[3:7], ((Outcome == "Win") - (Outcome == "Loss")))
but fear I am using the wrong approach as I received the following error: Error in inherits(x, "fun_list") : object 'Outcome' not found
My anticipated outcome would be:
head(AnticipatedOutcome)
MatchID Var1 Var2 Var3 Var4 Var5
1 30 6 3 6 3 6
3 31 21 28 6 13 0
5 32 4 -6 4 -6 3
Is this please possible to achieve, using dplyr?
Upvotes: 2
Views: 156
Reputation: 20095
One can use data.table
with .SDcols
argument to summarise the data. As @akrun has mentioned in his solution, sum
of "Loss" will be subtracted from the sum
of "Win" for each Match.
library(data.table)
setDT(df)
df[,lapply(.SD,function(x)sum(x[Outcome=="Win"]) - sum(x[Outcome=="Loss"])),
.SDcols = Var1:Var5,by=MatchID]
# MatchID Var1 Var2 Var3 Var4 Var5
# 1: 30 6 3 6 3 6
# 2: 31 21 28 6 13 0
# 3: 32 4 -6 4 -6 3
Note: Just for the sake of exploring different ideas, but one can achieve same result in even base-R
:
cbind(unique(df[1]), df[order(df$MatchID),][df$Outcome == "Win",3:7] -
df[order(df$MatchID),][df$Outcome == "Loss",3:7])
# MatchID Var1 Var2 Var3 Var4 Var5
# 1 30 6 3 6 3 6
# 3 31 21 28 6 13 0
# 5 32 4 -6 4 -6 3
Data:
df <- read.table(text =
"MatchID Outcome Var1 Var2 Var3 Var4 Var5
1 30 Loss 0 10 0 10 0
2 30 Win 6 13 6 13 6
3 31 Loss 8 12 3 6 3
4 31 Win 29 40 9 19 3
5 32 Loss 7 26 7 26 6
6 32 Win 11 20 11 20 9",
header =TRUE, stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 887391
The difference of two logical vectors is of the same length. We need to subset the 'Var' columns where 'Outcome' is "Win"
, take the sum
of it and subtract it from those where the 'Outcome' is "Loss"
library(tidyverse)
Test %>%
group_by(MatchID) %>%
summarise_at(vars(starts_with('Var')),
funs(sum(.[Outcome == "Win"]) - sum(.[Outcome == "Loss"])))
# A tibble: 3 x 6
# MatchID Var1 Var2 Var3 Var4 Var5
# <int> <int> <int> <int> <int> <int>
#1 30 6 3 6 3 6
#2 31 21 28 6 13 0
#3 32 4 -6 4 -6 3
Or another option would be to gather
into 'long' format, get the group by difference of sum
and spread
it to 'wide' format
Test %>%
gather(key, val, Var1:Var5) %>%
group_by(MatchID, key) %>%
summarise(val = sum(val[Outcome == "Win"]) - sum(val[Outcome == "Loss"])) %>%
spread(key, val)
Upvotes: 4