Morts81
Morts81

Reputation: 439

Covariance Matrix - R

I have a data frame of player statistics, what I want to be able to do is create a covariance matrix between players for MB stats to understand which players perform well together and which typically detract from each other.

Note that not all players play in each game.

I'd like to have something like the following where obviously 'x' is the relevant covariance value.

               Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh  etc, etc
1           Damian Lillard              x             x            x           x
2            C.J. McCollum              x             x            x           x
3             Allen Crabbe              x             x            x           x
4              Noah Vonleh              x             x            x           x
5                 Ed Davis              x             x            x           x
6          Al-Farouq Aminu              x             x            x           x
7              Evan Turner              x             x            x           x
8         Maurice Harkless              x             x            x           x
9           Meyers Leonard              x             x            x           x
10           Mason Plumlee              x             x            x           x
11          Shabazz Napier              x             x            x           x

> df
          Player.Name  Tm   MB    DS     Game
1      Damian Lillard POR 54.8 59.50 20161025
11      C.J. McCollum POR 30.9 32.50 20161025
16       Allen Crabbe POR 24.1 28.25 20161025
19        Noah Vonleh POR 14.2 15.25 20161025
22           Ed Davis POR 17.9 18.00 20161025
26    Al-Farouq Aminu POR 16.3 18.25 20161025
34        Evan Turner POR 20.5 19.25 20161025
64   Maurice Harkless POR  4.7  5.25 20161025
65     Meyers Leonard POR  2.7  2.25 20161025
68      Mason Plumlee POR  4.7  4.00 20161025
290  Maurice Harkless POR 35.6 35.75 20161027
295     Mason Plumlee POR 36.6 36.75 20161027
299    Damian Lillard POR 41.5 44.25 20161027
309     C.J. McCollum POR 26.8 27.50 20161027
318      Allen Crabbe POR 17.2 16.25 20161027
349       Noah Vonleh POR  5.0  4.75 20161027
358       Evan Turner POR 10.7 10.50 20161027
359          Ed Davis POR  5.6  5.50 20161027
364    Shabazz Napier POR  0.0  0.00 20161027
369   Al-Farouq Aminu POR 13.6 13.25 20161027
545    Damian Lillard POR 56.5 58.25 20161029
557     C.J. McCollum POR 49.5 51.25 20161029
610     Mason Plumlee POR 22.9 22.50 20161029
611      Allen Crabbe POR 22.6 22.75 20161029
637       Evan Turner POR 15.6 16.75 20161029
649   Al-Farouq Aminu POR 27.9 28.25 20161029
673          Ed Davis POR  8.9  9.50 20161029
704       Noah Vonleh POR  4.8  5.00 20161029
719  Maurice Harkless POR  9.6 11.00 20161029
723    Meyers Leonard POR  6.2  6.25 20161029
728    Shabazz Napier POR  0.0  0.00 20161029

data

structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum", 
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu", 
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee", 
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum", 
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier", 
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee", 
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis", 
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1, 
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8, 
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9, 
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18, 
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25, 
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75, 
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L, 
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L, 
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName", 
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")

Upvotes: 1

Views: 1686

Answers (2)

bouncyball
bouncyball

Reputation: 10761

I think what you first need to do is reshape the data, so that each row is a game, and each column is the MB for a game for a player. Suppose our data is in dat:

dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB"         "Game"      

#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
        timevar = 'PlayerName')

dat.wide[1:4, 1:4]
       Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1  20161025              54.8             30.9            24.1
11 20161027              41.5             26.8            17.2
21 20161029              56.5             49.5            22.6

#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]

                  MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard          67.46333         71.10833          28.370          17.23
MB.C.J. McCollum           71.10833        146.34333          20.495         -23.61
MB.Allen Crabbe            28.37000         20.49500          13.170          12.75
MB.Noah Vonleh             17.23000        -23.61000          12.750          28.84

Upvotes: 1

Keith Hughitt
Keith Hughitt

Reputation: 4960

You can use to cov() function to achieve this, e.g.:

cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName


> cov_mat[1:3,1:3]
               Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard        11.0450          3.76      9.75250
C.J. McCollum          3.7600          1.28      3.32000
Allen Crabbe           9.7525          3.32      8.61125

If you want to compute correlations instead, just swap cov() for cor().

Upvotes: 1

Related Questions