Bas Carpentero
Bas Carpentero

Reputation: 5

Numpy covariance matrix implementation

For a project I'm implementing the CSP algorithm in which I need a covariance matrix. In multiple sources of literature I found the following expression for computing a covariance matrix: (X times X^{T}) / trace(X times X^{T}) for some matrix X which I implemented:

covariance = np.matmul(A, np.transpose(A)) / (np.matmul(np.dot(A, np.transpose(A))))

From a mathematical standpoint it makes sense in my computations but this expression simply does not give me the same result as the standard numpy implementation np.cov(A). My question is if the expression I found is actually a valid computation for covariance matrices?

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125039#pone.0125039.ref006 gives the mathematices under the section materials and methods

Upvotes: 0

Views: 623

Answers (1)

piterbarg
piterbarg

Reputation: 8229

There are multiple issues with your formula. Let me do it step by step

First define a matrix A with two variables and five observations each. Use random numbers for demonstrattion

np.random.seed(23423)
n_obs = 5
n_var = 2
A = np.random.normal(0.0,1.0, size=(n_var,n_obs))
A

output

array([[ 1.51978937,  0.56200696, -0.17770035,  0.01628744,  2.45856104],
       [-1.25514739, -0.17201069, -0.42486057, -0.09215201, -0.21674946]])

Second, we need to center the variables, ie subtract their mean. A0 will be a centered version of A

A0 = A - A.mean(axis=1).reshape(n_var,-1)
A0

output

array([[ 0.64400048, -0.31378193, -1.05348924, -0.85950145,  1.58277214],
       [-0.82296337,  0.26017334,  0.00732345,  0.34003201,  0.21543456]])

Now we can calculate the covariance, using a version of your formula (only the numerator, but divide by the number of observations -1)

(A0 @ A0.T)/(n_obs-1)

output

array([[ 1.21673642, -0.14265396],
       [-0.14265396,  0.22676158]])

Let's compare this to np.cov:

np.cov(A)

output

array([[ 1.21673642, -0.14265396],
       [-0.14265396,  0.22676158]])

The two match as promised.

Your original formula looks more like a correlation not covariance; but it also has multiple issues (only one argument to matmul where it needs two, lacks the centering step etc)

Upvotes: 2

Related Questions