Reputation: 439
I have a data frame of player statistics, what I want to be able to do is create a covariance matrix between players for MB stats to understand which players perform well together and which typically detract from each other.
Note that not all players play in each game.
I'd like to have something like the following where obviously 'x' is the relevant covariance value.
Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh etc, etc
1 Damian Lillard x x x x
2 C.J. McCollum x x x x
3 Allen Crabbe x x x x
4 Noah Vonleh x x x x
5 Ed Davis x x x x
6 Al-Farouq Aminu x x x x
7 Evan Turner x x x x
8 Maurice Harkless x x x x
9 Meyers Leonard x x x x
10 Mason Plumlee x x x x
11 Shabazz Napier x x x x
> df
Player.Name Tm MB DS Game
1 Damian Lillard POR 54.8 59.50 20161025
11 C.J. McCollum POR 30.9 32.50 20161025
16 Allen Crabbe POR 24.1 28.25 20161025
19 Noah Vonleh POR 14.2 15.25 20161025
22 Ed Davis POR 17.9 18.00 20161025
26 Al-Farouq Aminu POR 16.3 18.25 20161025
34 Evan Turner POR 20.5 19.25 20161025
64 Maurice Harkless POR 4.7 5.25 20161025
65 Meyers Leonard POR 2.7 2.25 20161025
68 Mason Plumlee POR 4.7 4.00 20161025
290 Maurice Harkless POR 35.6 35.75 20161027
295 Mason Plumlee POR 36.6 36.75 20161027
299 Damian Lillard POR 41.5 44.25 20161027
309 C.J. McCollum POR 26.8 27.50 20161027
318 Allen Crabbe POR 17.2 16.25 20161027
349 Noah Vonleh POR 5.0 4.75 20161027
358 Evan Turner POR 10.7 10.50 20161027
359 Ed Davis POR 5.6 5.50 20161027
364 Shabazz Napier POR 0.0 0.00 20161027
369 Al-Farouq Aminu POR 13.6 13.25 20161027
545 Damian Lillard POR 56.5 58.25 20161029
557 C.J. McCollum POR 49.5 51.25 20161029
610 Mason Plumlee POR 22.9 22.50 20161029
611 Allen Crabbe POR 22.6 22.75 20161029
637 Evan Turner POR 15.6 16.75 20161029
649 Al-Farouq Aminu POR 27.9 28.25 20161029
673 Ed Davis POR 8.9 9.50 20161029
704 Noah Vonleh POR 4.8 5.00 20161029
719 Maurice Harkless POR 9.6 11.00 20161029
723 Meyers Leonard POR 6.2 6.25 20161029
728 Shabazz Napier POR 0.0 0.00 20161029
structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu",
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee",
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier",
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee",
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis",
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1,
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8,
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9,
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18,
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25,
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75,
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L,
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L,
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L,
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName",
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")
Upvotes: 1
Views: 1686
Reputation: 10761
I think what you first need to do is reshape
the data, so that each row is a game, and each column is the MB
for a game for a player. Suppose our data is in dat
:
dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB" "Game"
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
timevar = 'PlayerName')
dat.wide[1:4, 1:4]
Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1 20161025 54.8 30.9 24.1
11 20161027 41.5 26.8 17.2
21 20161029 56.5 49.5 22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard 67.46333 71.10833 28.370 17.23
MB.C.J. McCollum 71.10833 146.34333 20.495 -23.61
MB.Allen Crabbe 28.37000 20.49500 13.170 12.75
MB.Noah Vonleh 17.23000 -23.61000 12.750 28.84
Upvotes: 1
Reputation: 4960
You can use to cov()
function to achieve this, e.g.:
cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName
> cov_mat[1:3,1:3]
Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard 11.0450 3.76 9.75250
C.J. McCollum 3.7600 1.28 3.32000
Allen Crabbe 9.7525 3.32 8.61125
If you want to compute correlations instead, just swap cov()
for cor()
.
Upvotes: 1