Aquarius24
Aquarius24

Reputation: 1866

Decide best 'k' in k-means algorithm in weka

I am using k-means algorithm for clustering but I am not sure how to decide best optimal value of k based on the results. For ex, i have applied k-means on a dataset for k=10:

kMeans
======

Number of iterations: 16
Within cluster sum of squared errors: 38.47923197081721
Missing values globally replaced with mean/mode

Cluster centroids:
                                                         Cluster#
Attribute                          Full Data                    0                    1                    2                    3                    4                    5                    6                    7                    8                    9
                                       (214)                 (16)                  (9)                 (13)                 (23)                 (46)                 (12)                 (11)                 (40)                 (15)                 (29)
==============================================================================================================================================================================================================================================================
RI                                    1.5184               1.5181               1.5175               1.5189               1.5178               1.5172                1.519               1.5255               1.5175               1.5222               1.5171
Na                                   13.4079              12.9988              14.6467              12.8277              13.2148              13.1896                13.63              12.6318              13.0518              13.9107              14.4421
Mg                                    2.6845               3.4894               1.3056               0.7738               3.4261               3.4987               3.4917               0.2145               3.4958               3.8273               0.5383
Al                                    1.4449               1.1844               1.3667               2.0338               1.3552               1.4898               1.3308               1.1891               1.2617                0.716               2.1228
Si                                   72.6509               72.785              73.2067              72.3662              72.6526              72.6989                72.07              72.0709              72.9532              71.7467              72.9659
K                                     0.4971               0.4794                    0                 1.47                0.527                 0.59               0.4108               0.2345                0.547               0.1007               0.3252
Ca                                     8.957               8.8069               9.3567              10.1238               8.5648               8.3041                 8.87              13.1291               8.5035               9.5887               8.4914
Ba                                     0.175                0.015                    0               0.1877                0.023                0.003               0.0667               0.2864                    0                    0                 1.04
Fe                                     0.057               0.2238                    0               0.0608               0.2013               0.0104               0.0167               0.1109                0.011               0.0313               0.0134
Type                    build wind non-float     build wind float            tableware           containers build wind non-float build wind non-float     build wind float build wind non-float     build wind float     build wind float            headlamps

Upvotes: 0

Views: 1833

Answers (1)

Dinesh
Dinesh

Reputation: 239

There are various methods for deciding the optimal value for "k" in k-means algorithm Thumb-Rule, elbow method, silhouette method etc. In my work I used to follow the result obtained form the elbow method and got succeed with my results, I had done all the analysis in the R-Language. Here is the link of the description for those methods link Try to find the sub links of the given link, build a code for any one of the method & apply on your data.

I hope this will help you, if not I am sorry.

All the Best with your work.

Upvotes: 1

Related Questions