user14625427
user14625427

Reputation: 31

Error: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)

I previously Replace missing values, trasform variables and delate redundant values. The code ran :/

from sklearn.metrics import silhouette_samples, silhouette_score  
from sklearn.cluster import KMeans range_n_clusters=[1,2,3,4,5]    
for n_clusters in range_n_clusters:   
clusterer =KMeans(n_clusters=n_clusters, random_state=10)  
cluster_labels=clusterer.fit_predict(df)  

 silhouette_avg=silhouette_score(df, cluster_labels)  
print('For n_clusters=', n_clusters,  
'The aversge silhouette_score is :', silhouette_avg) 
    
sample_silhouette_values = silhouette_samples(df, cluster_kabels)

The error:

ValueError                                Traceback (most recent call last)
<ipython-input-40-1bd61ca1e514> in <module>
  7     cluster_labels=clusterer.fit_predict(df)
  8 
----> 9     silhouette_avg=silhouette_score(df, cluster_labels)
 10     print('For n_clusters=', n_clusters,
 11          'The aversge silhouette_score is :', silhouette_avg)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
 71                           FutureWarning)
 72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
 ---> 73         return f(**kwargs)
 74     return inner_f
 75 

 ~\anaconda3\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py in silhouette_score(X, labels, metric, sample_size, random_state, **kwds)
115         else:
116             X, labels = X[indices], labels[indices]
--> 117     return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
118 
119 

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
 71                           FutureWarning)
 72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
 ---> 73         return f(**kwargs)
 74     return inner_f
 75 

~\anaconda3\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py in silhouette_samples(X, 
labels, 
metric, **kwds)
227     n_samples = len(labels)
228     label_freqs = np.bincount(labels)
--> 229     check_number_of_labels(len(le.classes_), n_samples)
230 
231     kwds['metric'] = metric

~\anaconda3\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py in 
check_number_of_labels(n_labels, n_samples)
 32     """
 33     if not 1 < n_labels < n_samples:
 ---> 34         raise ValueError("Number of labels is %d. Valid values are 2 "
 35                          "to n_samples - 1 (inclusive)" % n_labels)
 36 

ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)

Upvotes: 3

Views: 3311

Answers (2)

user21248435
user21248435

Reputation:

The mistake is here, and I met it, too.

silhouette_avg=silhouette_score(df, cluster_labels)  

Upvotes: 0

StupidWolf
StupidWolf

Reputation: 46898

You can only do kmeans with at least 2 clusters. k=1 would be the dataset itself without any label. So if you implement the code below (pay attention to the idents), it should work:

from sklearn import datasets
iris = datasets.load_iris()
df = iris.data

from sklearn.metrics import silhouette_samples, silhouette_score  
from sklearn.cluster import KMeans 

range_n_clusters=[2,3,4,5]    

for n_clusters in range_n_clusters:   
    clusterer =KMeans(n_clusters=n_clusters, random_state=10)  
    cluster_labels=clusterer.fit_predict(df)  
    
    silhouette_avg=silhouette_score(df, cluster_labels)  
    print('For n_clusters=', n_clusters,'The aversge silhouette_score is :', silhouette_avg) 
    sample_silhouette_values = silhouette_samples(df, cluster_labels)


For n_clusters= 2 The aversge silhouette_score is : 0.681046169211746
For n_clusters= 3 The aversge silhouette_score is : 0.5528190123564091
For n_clusters= 4 The aversge silhouette_score is : 0.4980505049972867
For n_clusters= 5 The aversge silhouette_score is : 0.4887488870931048

Upvotes: 2

Related Questions