Reputation: 309
On my project I have used k-means to classify data between groups, but I have a problem with the computation of the k-means from Scikit-learn - it was very slow. I need to boost it.
I have tried to change the number of n_jobs
to -1
, but still very slow!
Any suggestions how to speed up?
Upvotes: 14
Views: 19847
Reputation: 40159
scikit-learn 0.23+ now comes with an optimized implementation with a new way to parallelize work across CPUs:
https://scikit-learn.fondation-inria.fr/implementing-a-faster-kmeans-in-scikit-learn-0-23/
Upvotes: 2
Reputation: 66805
The main solution in scikit-learn is to switch to mini-batch kmeans which reduces computational resources a lot. To some extent it is an analogous approach to SGD (Stochastic Gradient Descent) vs. GD (Gradient Descent) for optimising non-linear functions - SGD is usually faster (in terms of computational cycles needed to converge to the local solution). Note that this introduces more variance to the optimisation, thus results might be harder to reproduce (optimisation will end up in different solutions more often than "full batch" kmeans).
Upvotes: 17