Reputation: 3983
I want to sample only some elements of a vector from a sum of gaussians that is given by their means and covariance matrices.
Specifically:
I'm imputing data using gaussian mixture model (GMM). I'm using the following procedure and sklearn:
There are two problems that I see with this. (A) how do I sample from the sum of gaussians, (B) how do I sample only part of the vector. I assume both can be solved at the same time. For (A), I can use rejection sampling or inverse transform sampling but I feel that there is a better way utilizing multivariate normal distribution generators in numpy. Or, some other efficient method. For (B), I just need to multiply the sampled variable by a gaussian that has known values from the sample as an argument. Right?
I would prefer a solution in python but an algorithm or pseudocode would be sufficient.
Upvotes: 1
Views: 1317
Reputation: 3438
I believe this question amounts to a conditional probability question. For starters I will make an sklearn implementation with badly written code.
I will assume you already have an sklearn gaussian mixture model, which you got from a dataset, or "impute". The following code block will make one from a dataset:
import numpy
import sklearn
import sklearn.mixture
GaussianMixtureObjectSklearn = sklearn.mixture.GaussianMixture(
n_components = NumberComponents,
covariance_type = 'full',
)
GaussianMixtureObjectSklearn.fit(NumpyTwoDimensionalDataset)
If you wanted to get back out a probability you would do something like the following:
#Turn the model object into a single function
def GaussianMixtureModelFunction( Point ):
return numpy.exp( GaussianMixtureObjectSklearn.score_samples( numpy.atleast_2d( Point ) ) )
#return clf.score( numpy.atleast_2d( Point ) )
To make a sample using the full GMM we could use the built in method:
samples = GaussianMixtureModelFunction.sample( 1000 )
But instead, we want to generate a conditional sample, fixing some elements of ''point'' and allowing the others to vary. The native method in sklearn won't work for this. The easiest solution will be to instead get the weights, means, covs back out of the GMM:
weights = GaussianMixtureObjectSklearn.weights_
means = GaussianMixtureObjectSklearn.means_
covs = GaussianMixtureObjectSklearn.covariances_
First use the weights to pick a Gaussian:
chosen_gaussian_index = np.random.choice(len(weights), 1, p=weights)
and then sample the chosen gaussian conditionally by dimension using this other stack-overflow answer here:
Python/Numpy: Conditional simulation from a multivatiate distribution
gcov = covs[chosen_gaussian_index]
gmean = means[chosen_gaussian_index]
#TODO --> use the linked answer to sample a single gaussian conditionally
Upvotes: 0
Reputation: 3983
Since for sampling only relative proportion of the distribution matters, scaling preface or can be thrown away. For diagonal covariance matrix, one can just use the covariance submarine and mean subvector that has dimensions of missing data. For covariance with off-diagonal elements, the mean and std dev of a sampling gaussian will need to be changed.
Upvotes: 1