Reputation: 1533
I'm trying to calculate sample covariance of a given data.
the code I wrote is:
def calcCov(x):
m, n = x.shape
mean = np.mean(x, axis=0)
cov = np.zeros((n, n))
for j in range(0, n):
for k in range(0, n):
sum = 0
for i in range(0, m):
sum += (x[i, j] - mean[j])*(x[i, k] - mean[k])
cov[j, k] = sum / (m - 1.0)
return cov
It is not the most efficient way to do this, but it is simple and is a direct copy of https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Sample_covariance to the best of my knowledge.
Covariance matrix is always positive semidefinite. But when I calculate the eigenvalues (with np.eig) i see negative eigenvalues sometimes.
for example the code
data = np.random.rand(2, 2)
print data
cov = calcCov(data)
eigvals, eigvec = np.linalg.eig(cov)
print cov
print eigvals
prints the output
[[ 0.12873309 0.92079275]
[ 0.90018866 0.73197021]]
[[ 0.29757185 -0.0728341 ]
[-0.0728341 0.01782698]]
[ 3.15398823e-01 -3.46944695e-18]
as a mathematician that is very unsettling. Why does this happen? simple numerical errors? did i make a mistake in my calculation of the covariance?
Upvotes: 1
Views: 2031
Reputation: 7476
First, I would suggest to use numpy's covariance function, since it will be more efficient: https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.cov.html
Given the "negative" eigenvalues you have is e-18, it is fair to consider it 0 up to numerical error.
Upvotes: 4