Klops
Klops

Reputation: 1571

Fitting a Gaussian Mixture Model with known share of noise/outliers

Question was moved to stats.stackexchange

A Gaussian Mixture model is fitted by the Expectation-Maximization algorithm.

This fairly simple algorithm consists of two steps and the initialization.

  1. Initialization (for k=2 Gaussians) Find (or guess) an initial muand sigma parameter value for both Gaussians.

  2. E-step Use the current Gaussian parameters to estimate for each data point the likelihood that is comes from Gaussian A (or B). Dividing these likelihoods by the sum of both allows to obtain probabilities (or posteriors) that a point is rather from A than from B. Here prior information could be introduced, if the share of the underlying classes is known.

  3. M-step All mus and sigmas are updated. For example, new mu value for Gaussian A can be calculated by the weighted sum of all data points. The weigths are the posteriors of the previous step. Similarly the sigmas can be derived.

I wonder, how I could extend this algorithm to ignore a known share of data points. For example, I have the external knowledge that my dataset consists of p(A)=70% and p(B)=15% and another 15% of unknown species. I want to fit two Gaussians to describe the chestnut and sunflower distribution and want to allow the algorithm to ignore 15% of the data. This allows for smaller sigmas, as the remaining two Gaussians have no need to be stretched to cover data that is from another unknown distribution.

Note: Since the unknown species are possibly multiple species, I can not introduce a third Gaussian to "absorb" it. Also, the unknown species could be similarly distributed as A, so it is not necessarily the best thing to remove outliers.

So far, I use the known share of A and B as priors in the E-Step but I couldn't find an elegant way to ignore the unknown data. Any ideas? Thank you!

Example with 70% species A, 15% species B and 15% unknown

Example with 70% species A, 15% species B and 15% unknown

Upvotes: 0

Views: 167

Answers (0)

Related Questions