Reputation: 5
For a project I'm implementing the CSP algorithm in which I need a covariance matrix. In multiple sources of literature I found the following expression for computing a covariance matrix: (X times X^{T}) / trace(X times X^{T}) for some matrix X which I implemented:
covariance = np.matmul(A, np.transpose(A)) / (np.matmul(np.dot(A, np.transpose(A))))
From a mathematical standpoint it makes sense in my computations but this expression simply does not give me the same result as the standard numpy implementation np.cov(A). My question is if the expression I found is actually a valid computation for covariance matrices?
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125039#pone.0125039.ref006 gives the mathematices under the section materials and methods
Upvotes: 0
Views: 623
Reputation: 8229
There are multiple issues with your formula. Let me do it step by step
First define a matrix A
with two variables and five observations each. Use random numbers for demonstrattion
np.random.seed(23423)
n_obs = 5
n_var = 2
A = np.random.normal(0.0,1.0, size=(n_var,n_obs))
A
output
array([[ 1.51978937, 0.56200696, -0.17770035, 0.01628744, 2.45856104],
[-1.25514739, -0.17201069, -0.42486057, -0.09215201, -0.21674946]])
Second, we need to center the variables, ie subtract their mean. A0
will be a centered version of A
A0 = A - A.mean(axis=1).reshape(n_var,-1)
A0
output
array([[ 0.64400048, -0.31378193, -1.05348924, -0.85950145, 1.58277214],
[-0.82296337, 0.26017334, 0.00732345, 0.34003201, 0.21543456]])
Now we can calculate the covariance, using a version of your formula (only the numerator, but divide by the number of observations -1)
(A0 @ A0.T)/(n_obs-1)
output
array([[ 1.21673642, -0.14265396],
[-0.14265396, 0.22676158]])
Let's compare this to np.cov
:
np.cov(A)
output
array([[ 1.21673642, -0.14265396],
[-0.14265396, 0.22676158]])
The two match as promised.
Your original formula looks more like a correlation not covariance; but it also has multiple issues (only one argument to matmul
where it needs two, lacks the centering step etc)
Upvotes: 2