Vitor Fernandes
Vitor Fernandes

Reputation: 31

Reduce dist directory size while using sklearn.cluster.KMeans (python + py2exe)

I'm having a little trouble when turning my python script into an executable. It's size is too big for me to distribute to my client.

Well, the problem is that I use just a few code of sklearn and it results in a total of 240 MB inside my distribution directory. I know that it's not because I use only one thing that I don't need the others. But I'm searching for a way to reduce this size, or even have an alternative to the KMeans class, with a more lightweight machine-learning package for python.

If needed, the parts of the code that use this feature are:

from sklearn.cluster import KMeans
...
# clus is just hanging an object instance of KMeans
clus = KMeans(n_clusters = _numBlocks, random_state = 1, n_jobs = 1)
# and here, I just call its method
_hourmap = clus.fit_predict(Load2Clus)
...

Upvotes: 3

Views: 185

Answers (1)

Jamie Bull
Jamie Bull

Reputation: 13529

Well kmeans is a very simple algorithm and just a tiny part of sklearn as you recognise. I'd avoid using sklearn if you are constrained on memory and that is the only part of the whole package that you use. You also may not need numpy, scipy and possibly other packages unless you're using them elsewhere in your code.

Your options are:

  • Implement your own version of K-means in Python.
  • Use the simple kmeans package from here which wraps a C implementation of KMeans.
  • Use the different light-weight package as you have already identified.

Other things to consider for reducing the size of your library archive are given here, including:

  • Excluding various external libraries
  • Excluding parts of the standard library
  • Compressing the archive

Which of these will suit you best depends on your program.

Upvotes: 1

Related Questions