Luigi Biasi
Luigi Biasi

Reputation: 59

GMM in MATLAB gives different results for the same file

I constructed a Gaussian Mixture Model in Matlab with a dataset:

model = gmdistribution.fit(data,M,'Replicates',5);

with M = 3 Gaussian components. I tested new data with:

[P, l] = posterior(model,new_data);

I ran the program several times and didn't get the same result. Each run produces different log-likelihood values. I use the log-likelihood to make decisions, and this value for the same data (new_data) differs for each run. What does it depend on? How can I resolve this problem?

Upvotes: 1

Views: 438

Answers (1)

horchler
horchler

Reputation: 18504

First, assuming that you're using a newish version of Matlab, the gmdistribution.fit documentation indicates that the fit method is deprecated and that fitgmdist should be used. See here for an example.

Second, the documentation for gmdistribution.fit indicates that if the 'Replicates' option is larger than 1, the 'randSample' start method will be used to produce the initial parameters. This may be the cause (or at least one of the causes) of your observed variability.

Finally, you can also try using rng before calling gmdistribution.fit to set the seed of the global random number stream (assuming the function doesn't use it's own stream internally). Alternatively, you can try specifying an 'Options' parameter via statset:

seed = 1;
s = RandStream('mt19937ar','Seed',seed);
opts = statset('Streams',s);
model = gmdistribution.fit(data,M,'Replicates',5,'Options',opts);

I can't test this fully myself – see the gmdistribution class documentation for further details.

Upvotes: 0

Related Questions