Danny
Danny

Reputation: 11

How do I choose the best k mean cluster in weka

As you can see the bottom result I have two different clusters using different seed. I would like to choose the best cluster out of the two clusters.

I know that the minimum square error is the better. However, it shows the same square error although I use different seeds. I want to know why it shows similar square error. I also want to know what other things I need to consider when i am selecting the best cluster.

*******************************************************************
kMeans
======

Number of iterations: 10
Within cluster sum of squared errors: 527.6988818392938
Missing values globally replaced with mean/mode

Cluster centroids:
                                  Cluster#
Attribute             Full Data          0          1
                         (4898)     (2781)     (2117)
=====================================================
fixedacidity             6.8548     6.9565     6.7212
volatileacidity          0.2782     0.2826     0.2725
citricacid               0.3342     0.3389     0.3279
residualsugar            6.3914     8.2678     3.9265
chlorides                0.0458     0.0521     0.0374
freesulfurdioxide       35.3081    38.6897    30.8658
totalsulfurdioxide     138.3607   155.2585   116.1627
density                   0.994     0.9958     0.9916
pH                       3.1883     3.1691     3.2134
sulphates                0.4898      0.492     0.4871
alcohol                 10.5143     9.6325    11.6726
quality                  5.8779     5.4779     6.4034




Time taken to build model (full training data) : 0.19 seconds

=== Model and evaluation on training set ===

Clustered Instances

0      2781 ( 57%)
1      2117 ( 43%)


***********************************************************************



kMeans
======

Number of iterations: 7
Within cluster sum of squared errors: 527.6993178146143
Missing values globally replaced with mean/mode

Cluster centroids:
                                  Cluster#
Attribute             Full Data          0          1
                         (4898)     (2122)     (2776)
=====================================================
fixedacidity             6.8548     6.7208     6.9572
volatileacidity          0.2782     0.2723     0.2828
citricacid               0.3342     0.3281     0.3389
residualsugar            6.3914     3.9451     8.2614
chlorides                0.0458     0.0374     0.0522
freesulfurdioxide       35.3081    30.9105    38.6697
totalsulfurdioxide     138.3607   116.2175   155.2871
density                   0.994     0.9917     0.9958
pH                       3.1883     3.2137     3.1689
sulphates                0.4898     0.4876     0.4916
alcohol                 10.5143    11.6695     9.6312
quality                  5.8779     6.4043     5.4755




Time taken to build model (full training data) : 0.15 seconds

=== Model and evaluation on training set ===

Clustered Instances

0      2122 ( 43%)
1      2776 ( 57%)

Upvotes: 0

Views: 1066

Answers (2)

adranale
adranale

Reputation: 2874

Using different seeds doesnot guarantee you different clusters in the result.

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Define "best result".

By the definition of k-means, a lower sum of squares is better.

Anything else is worse by k-means - but that doesn't mean that a different quality criterion (or clustering algorithm) could be more helpful for your actual problem.

Upvotes: 2

Related Questions