python memory error for kmeans in scikit-learn

Question

I am running this Selecting the number of clusters example of scikit-learn in python. the example gets several samples with 2 features and finds best k for kmeans clustering.

In my case I have samples with 3 features. they are 3 dimensional coordinates indeed. so, in the code I just change the input to my samples and the rest remains same. number of my sample points are very big maybe more than 10,000 points.

when I input all my data I got memory error (I have 16GB of RAM and all of it got full). But when I put half of my data it doesn't give the error. Although the error shows by ipython notebook for silhouette function but I am pretty sure it happens in kmeans and it doesn't perform clustering and jumps to this error suddenly.

With same amount of data I did kmeans clustering in C++ and it was totally fine and fast without any problem. is there any idea how can I resolve this problem? this is the error I got

         MemoryError              Traceback (most recent call last)
         in ()
 41     # This gives a perspective into the density and separation of the formed
 42     # clusters
---> 43     silhouette_avg = silhouette_score(X, cluster_labels)
 44     print("For n_clusters =", n_clusters,
 45           "The average silhouette_score is :", silhouette_avg)

/usr/lib64/python2.7/site-packages/sklearn/metrics/cluster/unsupervised.pyc in silhouette_score(X, labels, metric, sample_size, random_state, **kwds)
 82         else:
 83             X, labels = X[indices], labels[indices]
---> 84     return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
 85 
 86 

  /usr/lib64/python2.7/site-packages/sklearn/metrics/cluster/unsupervised.pyc in silhouette_samples(X, labels, metric, **kwds)
141 
142     """
 --> 143     distances = pairwise_distances(X, metric=metric, **kwds)
144     n = labels.shape[0]
145     A = np.array([_intra_cluster_distance(distances[i], labels, i)

 /usr/lib64/python2.7/site-packages/sklearn/metrics/pairwise.pyc in pairwise_distances(X, Y, metric, n_jobs, **kwds)
649         func = pairwise_distance_functions[metric]
650         if n_jobs == 1:
--> 651             return func(X, Y, **kwds)
652         else:
653             return _parallel_pairwise(X, Y, func, n_jobs, **kwds)

 /usr/lib64/python2.7/site-packages/sklearn/metrics/pairwise.pyc in euclidean_distances(X, Y, Y_norm_squared, squared)
181         distances.flat[::distances.shape[0] + 1] = 0.0
182 
--> 183     return distances if squared else np.sqrt(distances)
184 
185 

MemoryError:

python memory error for kmeans in scikit-learn

Answers (1)

Related Questions