Sampling parts of a vector from gaussian mixture model

I want to sample only some elements of a vector from a sum of gaussians that is given by their means and covariance matrices.

Specifically:

I'm imputing data using gaussian mixture model (GMM). I'm using the following procedure and sklearn:

impute with mean
get means and covariances with GMM (for example 5 components)
take one of the samples and sample only the missing values. the other values stay the same.
repeat a few times

There are two problems that I see with this. (A) how do I sample from the sum of gaussians, (B) how do I sample only part of the vector. I assume both can be solved at the same time. For (A), I can use rejection sampling or inverse transform sampling but I feel that there is a better way utilizing multivariate normal distribution generators in numpy. Or, some other efficient method. For (B), I just need to multiply the sampled variable by a gaussian that has known values from the sample as an argument. Right?

I would prefer a solution in python but an algorithm or pseudocode would be sufficient.

Upvotes: 1

Answers (2)

D A

Reputation: 3458

I believe this question amounts to a conditional probability question. For starters I will make an sklearn implementation with badly written code.

I will assume you already have an sklearn gaussian mixture model, which you got from a dataset, or "impute". The following code block will make one from a dataset:

import numpy
import sklearn
import sklearn.mixture

GaussianMixtureObjectSklearn = sklearn.mixture.GaussianMixture(
                n_components    = NumberComponents, 
                covariance_type = 'full',
                )
GaussianMixtureObjectSklearn.fit(NumpyTwoDimensionalDataset)

If you wanted to get back out a probability you would do something like the following:

#Turn the model object into a single function
def GaussianMixtureModelFunction( Point ):
    return numpy.exp( GaussianMixtureObjectSklearn.score_samples( numpy.atleast_2d( Point ) ) )
    #return clf.score( numpy.atleast_2d( Point ) )

To make a sample using the full GMM we could use the built in method:

samples = GaussianMixtureModelFunction.sample( 1000 )

But instead, we want to generate a conditional sample, fixing some elements of ''point'' and allowing the others to vary. The native method in sklearn won't work for this. The easiest solution will be to instead get the weights, means, covs back out of the GMM:

weights = GaussianMixtureObjectSklearn.weights_ 
means = GaussianMixtureObjectSklearn.means_     
covs = GaussianMixtureObjectSklearn.covariances_

First use the weights to pick a Gaussian:

chosen_gaussian_index = np.random.choice(len(weights), 1, p=weights)

and then sample the chosen gaussian conditionally by dimension using this other stack-overflow answer here:

Python/Numpy: Conditional simulation from a multivatiate distribution

gcov = covs[chosen_gaussian_index]
gmean = means[chosen_gaussian_index]

#TODO --> use the linked answer to sample a single gaussian conditionally

Upvotes: 0

kirill_igum

Reputation: 3993

Since for sampling only relative proportion of the distribution matters, scaling preface or can be thrown away. For diagonal covariance matrix, one can just use the covariance submarine and mean subvector that has dimensions of missing data. For covariance with off-diagonal elements, the mean and std dev of a sampling gaussian will need to be changed.

Upvotes: 1

Sampling parts of a vector from gaussian mixture model

Answers (2)

Related Questions