Reputation: 31
I previously Replace missing values, trasform variables and delate redundant values. The code ran :/
from sklearn.metrics import silhouette_samples, silhouette_score
from sklearn.cluster import KMeans range_n_clusters=[1,2,3,4,5]
for n_clusters in range_n_clusters:
clusterer =KMeans(n_clusters=n_clusters, random_state=10)
cluster_labels=clusterer.fit_predict(df)
silhouette_avg=silhouette_score(df, cluster_labels)
print('For n_clusters=', n_clusters,
'The aversge silhouette_score is :', silhouette_avg)
sample_silhouette_values = silhouette_samples(df, cluster_kabels)
The error:
ValueError Traceback (most recent call last)
<ipython-input-40-1bd61ca1e514> in <module>
7 cluster_labels=clusterer.fit_predict(df)
8
----> 9 silhouette_avg=silhouette_score(df, cluster_labels)
10 print('For n_clusters=', n_clusters,
11 'The aversge silhouette_score is :', silhouette_avg)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~\anaconda3\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py in silhouette_score(X, labels, metric, sample_size, random_state, **kwds)
115 else:
116 X, labels = X[indices], labels[indices]
--> 117 return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
118
119
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~\anaconda3\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py in silhouette_samples(X,
labels,
metric, **kwds)
227 n_samples = len(labels)
228 label_freqs = np.bincount(labels)
--> 229 check_number_of_labels(len(le.classes_), n_samples)
230
231 kwds['metric'] = metric
~\anaconda3\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py in
check_number_of_labels(n_labels, n_samples)
32 """
33 if not 1 < n_labels < n_samples:
---> 34 raise ValueError("Number of labels is %d. Valid values are 2 "
35 "to n_samples - 1 (inclusive)" % n_labels)
36
ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)
Upvotes: 3
Views: 3311
Reputation:
The mistake is here, and I met it, too.
silhouette_avg=silhouette_score(df, cluster_labels)
Upvotes: 0
Reputation: 46898
You can only do kmeans with at least 2 clusters. k=1 would be the dataset itself without any label. So if you implement the code below (pay attention to the idents), it should work:
from sklearn import datasets
iris = datasets.load_iris()
df = iris.data
from sklearn.metrics import silhouette_samples, silhouette_score
from sklearn.cluster import KMeans
range_n_clusters=[2,3,4,5]
for n_clusters in range_n_clusters:
clusterer =KMeans(n_clusters=n_clusters, random_state=10)
cluster_labels=clusterer.fit_predict(df)
silhouette_avg=silhouette_score(df, cluster_labels)
print('For n_clusters=', n_clusters,'The aversge silhouette_score is :', silhouette_avg)
sample_silhouette_values = silhouette_samples(df, cluster_labels)
For n_clusters= 2 The aversge silhouette_score is : 0.681046169211746
For n_clusters= 3 The aversge silhouette_score is : 0.5528190123564091
For n_clusters= 4 The aversge silhouette_score is : 0.4980505049972867
For n_clusters= 5 The aversge silhouette_score is : 0.4887488870931048
Upvotes: 2