rumdrums
rumdrums

Reputation: 1382

Numpy/Scipy: Singular Matrix Calculating probability of multivariate observation

I'm trying to calculate probabilities for observations in matrices where my rows are observations and my columns are features using python. I always get singular matrix errors, even when using random matrices, so I suspect something is wrong with my code:

from scipy.stats import multivariate_normal
import numpy as np


def get_mean(x, axis=0):
  return x.mean(axis=axis)


def get_sigma(x):
  return np.cov(x, rowvar=False)


def get_probabilities(x, mu, sigma):
  return multivariate_normal.pdf(x, mean=mu, cov=sigma)


x = np.random.rand(10,10)
t = np.random.rand(1, 10)

mu = get_mean(x)
sigma = get_sigma(x)
p = get_probabilities(t, mu, sigma)

This results in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in get_probabilities
  File "/usr/local/lib/python3.5/dist-packages/scipy/stats/_multivariate.py", line 512, in pdf
    psd = _PSD(cov, allow_singular=allow_singular)
  File "/usr/local/lib/python3.5/dist-packages/scipy/stats/_multivariate.py", line 159, in __init__
    raise np.linalg.LinAlgError('singular matrix')
numpy.linalg.linalg.LinAlgError: singular matrix

What am I doing wrong?

Upvotes: 1

Views: 1618

Answers (2)

MB-F
MB-F

Reputation: 23637

There is not enough data to estimate the covariance matrix sigma. x needs more rows than columns to estimate a well defined covariance matrix.

This for example will let the code run without problems:

x = np.random.rand(100, 10)

Alternatively, if you simply don't have enough data, you can use regularization (or shrinkage) to better condition the covariance matrix:

sigma += np.eye(10) * 1e-3  # problem: how to chose the factor

Scikit-learn has the ledoit_wolf covariance estimator, which automatically determines the amount of regularization based on the available data.

Upvotes: 3

Flo Martin
Flo Martin

Reputation: 131

I think the problem is that t is an np.array of shape (1,10) while it should be of shape (10,).

If you replace the line of code

t = np.random.rand(1, 10)

by

t = np.random.rand(10)

this works.

Upvotes: 1

Related Questions