Reputation: 383
Has anyone succeeded in speeding up scikit-learn models using numba and jit compilaition. The specific models I am looking at are regression models such as Logistic Regressions.
I am able to use numba to optimize the functions I write using sklearn models, but the model functions themselves are not affected by this and are not optimized, thus not providing a notable increase in speed. Is there are way to optimize the sklearn functions?
Any info about this would be much appreciated.
Upvotes: 13
Views: 9896
Reputation: 11
The @numba.vectorize
for arrays and @numba.guvectorise
for matricies, are decorators may help since they work to combine loop operations. They generate so called "ufunc"s which achieve this goal, but instead of having to manually write the c code yourself it generates it from the python input.
See: http://numba.pydata.org/numba-doc/dev/user/vectorize.html
Upvotes: 1
Reputation: 2487
Scikit-learn makes heavy use of numpy, most of which is written in C and already compiled (hence not eligible for JIT optimization).
Further, the LogisticRegression model is essentially LinearSVC with the appropriate loss function. I could be slightly wrong about that, but in any case, it uses LIBLINEAR to do the solving, which is again a compiled C library.
The makers of scikit-learn also make heavy use of one of the python-to-compiled systems, Pyrex I think, which again results in optimized machine compiled code ineligible for JIT compilation.
Upvotes: 10