Reputation: 451
In R, after running "random forest" model, I can use save.image("***.RData")
to store the model. Afterwards, I can just load the model to do predictions directly.
Can you do a similar thing in python? I separate the Model and Prediction into two files. And in Model file:
rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)
fit= rf.fit(Predx, Predy)
I tried to return rf
or fit
, but still can't load the model in the prediction file.
Can you separate the model and prediction using the sklearn random forest package?
Upvotes: 36
Views: 54536
Reputation: 1360
I'd reiterate that joblib does the job well and it provides really good compression options (ie lzma).
with open("clf.pkl", "wb") as out: pickle.dump(clf, out)
with open("clf.dill", "wb") as out: dill.dump(clf, out)
joblib.dump(clf, "clf.jbl")
joblib.dump(clf, "clf.jbl.lzma")
joblib.dump(clf, "clf.jbl.gz")
!du clf.*
24576 clf.dill
24576 clf.jbl
5120 clf.jbl.gz
3072 clf.jbl.lzma
24576 clf.pkl
Upvotes: 0
Reputation: 5859
You can use joblib
to save and load the Random Forest from scikit-learn (in fact, any model from scikit-learn)
The example:
import joblib
from sklearn.ensemble import RandomForestClassifier
# create RF
rf = RandomForestClassifier()
# fit on some data
rf.fit(X, y)
# save
joblib.dump(rf, "my_random_forest.joblib")
# load
loaded_rf = joblib.load("my_random_forest.joblib")
What is more, the joblib.dump
has compress
argument, so the model can be compressed. I made very simple test on iris dataset and compress=3
reduces the size of the file about 5.6 times.
Upvotes: 22
Reputation: 499
for the model storing you can also use .sav formate. it stores complete model and information.
Upvotes: 0
Reputation: 30737
I use dill, it stores all the data and I think possibly module information? Maybe not. I remember trying to use pickle
for storing these really complicated objects and it didn't work for me. cPickle
probably does the same job as dill
but i've never tried cpickle
. it looks like it works in literally the exact same way. I use "obj" extension but that's by no means conventional...It just made sense for me since I was storing an object.
import dill
wd = "/whatever/you/want/your/working/directory/to/be/"
rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)
rf.fit(Predx, Predy)
dill.dump(rf, open(wd + "filename.obj","wb"))
btw, not sure if you use iPython, but sometimes writing a file that way doesn't so you have to do the:
with open(wd + "filename.obj","wb") as f:
dill.dump(rf,f)
call the objects again:
model = dill.load(open(wd + "filename.obj","rb"))
Upvotes: 2
Reputation: 6545
...
import cPickle
rf = RandomForestRegresor()
rf.fit(X, y)
with open('path/to/file', 'wb') as f:
cPickle.dump(rf, f)
# in your prediction file
with open('path/to/file', 'rb') as f:
rf = cPickle.load(f)
preds = rf.predict(new_X)
Upvotes: 45