ashu
ashu

Reputation: 489

Is it possible to load the model once and reuse it again in python?

I have trained scikit learn model and now I want to use in my python code. Is there a way I can re-use the same model instance? In a simple way, I can load the model again whenever I need it, but as my needs are more frequent I want to load the model once and reuse it again.

Is there a way I can achieve this in python?

Here is the code for one thread in prediction.py:

clf = joblib.load('trainedsgdhuberclassifier.pkl')
clf.predict(userid)

Now for another user I don't want to initiate prediction.py again and spend time in loading the model. Is there a way, I can simply write.

new_recommendations = prediction(userid)

Is it multiprocessing that I should be using here? I am not sure !!

Upvotes: 2

Views: 4446

Answers (2)

Andreas Mueller
Andreas Mueller

Reputation: 28788

First, you should check how much of a bottleneck this is and if it is really worth avoiding the IO. An SGDClassifier is usually quite small. You can easily reuse the model, but the question is not really about how to reuse the model I would say, but how to get the new user instances to the classifier.

I would imagine userid is a feature vector, not an ID, right?

To make the model do prediction on new data, you need some kind of event based processing that calls the model when a new input arrives. I am by far no expert here but I think one simple solution might be using an http interface and use a light-weight server like flask.

Upvotes: 0

Oq01
Oq01

Reputation: 157

As per the Scikit-learn documentation the following code may help you:

from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)  
import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
clf2.predict(X[0])

In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl') 

Later you can load back the pickled model (possibly in another Python process) with:

clf = joblib.load('filename.pkl') 

Once you have loaded your model again. You can re-use it without retraining it.

clf.predict(X[0])

Source: http://scikit-learn.org/stable/modules/model_persistence.html

Upvotes: 7

Related Questions