mzm
mzm

Reputation: 383

Using Numba with scikit-learn

Has anyone succeeded in speeding up scikit-learn models using numba and jit compilaition. The specific models I am looking at are regression models such as Logistic Regressions.

I am able to use numba to optimize the functions I write using sklearn models, but the model functions themselves are not affected by this and are not optimized, thus not providing a notable increase in speed. Is there are way to optimize the sklearn functions?

Any info about this would be much appreciated.

Upvotes: 13

Views: 9896

Answers (2)

joshring
joshring

Reputation: 11

The @numba.vectorize for arrays and @numba.guvectorise for matricies, are decorators may help since they work to combine loop operations. They generate so called "ufunc"s which achieve this goal, but instead of having to manually write the c code yourself it generates it from the python input.

See: http://numba.pydata.org/numba-doc/dev/user/vectorize.html

Upvotes: 1

Andreus
Andreus

Reputation: 2487

Scikit-learn makes heavy use of numpy, most of which is written in C and already compiled (hence not eligible for JIT optimization).

Further, the LogisticRegression model is essentially LinearSVC with the appropriate loss function. I could be slightly wrong about that, but in any case, it uses LIBLINEAR to do the solving, which is again a compiled C library.

The makers of scikit-learn also make heavy use of one of the python-to-compiled systems, Pyrex I think, which again results in optimized machine compiled code ineligible for JIT compilation.

Upvotes: 10

Related Questions