How to speed-up k-means from Scikit learn?

On my project I have used k-means to classify data between groups, but I have a problem with the computation of the k-means from Scikit-learn - it was very slow. I need to boost it.

I have tried to change the number of n_jobs to -1, but still very slow!

Any suggestions how to speed up?

Upvotes: 14

Answers (2)

ogrisel

Reputation: 40159

scikit-learn 0.23+ now comes with an optimized implementation with a new way to parallelize work across CPUs:

https://scikit-learn.fondation-inria.fr/implementing-a-faster-kmeans-in-scikit-learn-0-23/

Upvotes: 2

lejlot

Reputation: 66805

The main solution in scikit-learn is to switch to mini-batch kmeans which reduces computational resources a lot. To some extent it is an analogous approach to SGD (Stochastic Gradient Descent) vs. GD (Gradient Descent) for optimising non-linear functions - SGD is usually faster (in terms of computational cycles needed to converge to the local solution). Note that this introduces more variance to the optimisation, thus results might be harder to reproduce (optimisation will end up in different solutions more often than "full batch" kmeans).

Upvotes: 17

How to speed-up k-means from Scikit learn?

Answers (2)

Related Questions