Tuan Anh Hoang-Vu
Tuan Anh Hoang-Vu

Reputation: 1995

EM soft clustering in lingpipe

In Lingpipe's EM tutorial they said that it is possible to run the algorithm with no supervised data:

It is possible to train a classifier in a completely unsupervised fashion by having the initial classifier assign categories at random. Only the number of categories must be fixed. The algorithm is exactly the same, and the result after convergence or the maximum number of epochs is a classifier.

But their class, TradNaiveBayesClassifier required a labeled and an unlabeled corpora to run. How can I modify it to run with no labelled data?

Upvotes: -1

Views: 159

Answers (1)

dragonxlwang
dragonxlwang

Reputation: 482

EM is a probabilistic maximal likelihood optimization algorithm. In general, it is applied to unsupervised algorithms (for clustering) such as PLSA, Gaussian Mixture Model.

I think the linepipe doc is saying that you can using random initialization of all data labels (distribution of labels for each data) and then feed into NB to compute the ELBO (evidence lower bound), and then maximize it to get update of parameters.

In short, you will need to use the NB to write up the M step --- updating the model parameters.

Upvotes: 0

Related Questions