Reputation: 59
I constructed a Gaussian Mixture Model in Matlab with a dataset:
model = gmdistribution.fit(data,M,'Replicates',5);
with M = 3
Gaussian components. I tested new data with:
[P, l] = posterior(model,new_data);
I ran the program several times and didn't get the same result. Each run produces different log-likelihood values. I use the log-likelihood to make decisions, and this value for the same data (new_data
) differs for each run. What does it depend on? How can I resolve this problem?
Upvotes: 1
Views: 438
Reputation: 18504
First, assuming that you're using a newish version
of Matlab, the gmdistribution.fit
documentation indicates that the fit
method is deprecated and that fitgmdist
should be used. See here
for an example.
Second, the documentation for gmdistribution.fit
indicates that if the 'Replicates'
option is larger than 1, the 'randSample'
start method will be used to produce the initial parameters. This may be the cause (or at least one of the causes) of your observed variability.
Finally, you can also try using rng
before calling gmdistribution.fit
to set the seed of the global random number stream (assuming the function doesn't use it's own stream internally). Alternatively, you can try specifying an 'Options'
parameter via statset
:
seed = 1;
s = RandStream('mt19937ar','Seed',seed);
opts = statset('Streams',s);
model = gmdistribution.fit(data,M,'Replicates',5,'Options',opts);
I can't test this fully myself – see the gmdistribution
class documentation for further details.
Upvotes: 0