Dchris
Dchris

Reputation: 3047

Use clustering for prediction in Weka

Can I use clustering (e.g. using k-means) to make predictions in Weka?

I have some data based on a research for president elections. I have answers from questionnaires (numeric attributes), and I have one attribute that is the answer for the question Who are you going to vote? (1, 2 or 3)

I make predictions using some classifiers (e.g. Bayes) in Weka. My results are based on that answer(vote intention) and I have about 60% recall(rate of correct predictions).

I understand that clustering is a different thing, but can I use clustering to make predictions? I've already tried so, but I've realized clustering always selects its own centroids, and it does not use my vote intention question.

Upvotes: 2

Views: 6269

Answers (2)

Khaled Alanezi
Khaled Alanezi

Reputation: 361

Yes. You can use the Weka interface to do prediction via clustering. First, upload your training data using the Preprocess tab. Then, go to classify tab, under classifier, click choose and under meta, choose ClassificationViaClustering. The default clustering algorithm used by weka is SimpleKMean but you can change that by clicking on the options string (i.e. the text next to the choose button) and weka will display a message box, click choose and a set of clustering algorithms will be listed to choose from (e.g. EM). After that, you can do Cross-Validation or upload a test data by clicking on set as you normally do when you use weka for classification.

Hope this will help anyone having the same question!

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Explain results of K-means

must be a colleague of yours. He seems to use the same data set, and it would be helpful if we could all have a look at the data.

In general, clustering is not classification or prediction.

However, you can try to improve your classification by using the information gained from clustering. Two such techniques:

  • substitute your data set with the cluster centers, and use this for classification (at least if your clusters are reasonably pure wrt. to the class label!)
  • train a separate classifier on each cluster, and build an ensemble out of them (in particular, if your clusters are inhomogenous)

But I belive your understanding of classification or clustering is not yet far enough to try out these. You need to handle them carefully, and know your data very well.

Upvotes: 3

Related Questions