Denis Yakovenko
Denis Yakovenko

Reputation: 3535

"index N is out of bounds for axis 0 with size N" when running Parallel KMeans whereas sequential KMeans works fine

I'm trying to run KMeans using scikit-learn implementation in parallel, but I keep getting the following error message:

Traceback (most recent call last):
  File "run_kmeans.py", line 114, in <module>
    kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 889, in fit
    return_n_iter=True)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 362, in k_means
    for seed in seeds)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibIndexError: JoblibIndexError
_________________________________________________________________________
Multiprocessing exception:
..........................................................................
IndexError: index 11683 is out of bounds for axis 0 with size 11683

When I run KMeans with n_jobs=1, i.e. in as sequential manner, I get no errors and everything works just fine. But with n_jobs=-1 I keep getting the error.

Here's the code I use:

kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)

descriptors is a numpy array with shape (11683, 128).


Am I doing something wrong or is it a bug in KMeans implementation?

What should I do about it (e.g. use BiniBatchKMeans etc)?

PS: I'm running it on the Ubuntu 16.04 64-bit machine with 4 Gb of RAM and Intel Core i7-4700HQ 2.40GHz

Upvotes: 1

Views: 1414

Answers (1)

Jacoxu
Jacoxu

Reputation: 76

This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).

https://github.com/scikit-learn/scikit-learn/issues/8583

Upvotes: 3

Related Questions