Emkan
Emkan

Reputation: 188

Error in joblib.load file loading

I am using Random Forest Regressor python's scikit-learn module for predicting some values. I used joblib.dump for saving models. Therea 24 joblib.dump files, and each weights 45 megabyte (sum of all files = 931mb). My problem is:

I want to load all this 24 files in one program to predict 24 values - but i cannot do it. It gives an MemoryError. How can i load all 24 joblib files in one program without any errors?

Thanks in advance...

Upvotes: 4

Views: 3787

Answers (1)

volodymyr
volodymyr

Reputation: 7554

There are few options, depending on where exactly you are running out of memory.

  • Since you are predicting 24 different values, based on the same input data, you can do predictions sequentially. So you keep only one RFR in memory at a time.

e.g.:

predictions = []
for regressor_file in all_regressors:
    regressor = joblib.load(regressor_file)
    predictions.append(regressor.predict(X))
  • (might not be applied to your case, but this problem is very common). You might be running out of memory when loading a large batch of input data. To solve this issue - you can split your input data and run prediction on sub-batch. That helped us when we moved from running predictions locally to EC2. Try to run your code on a smaller input dataset, to test whether this helps.

  • You may want to optimise parameters for RFR. You may find that you can get the same predictive power with shallower trees or smaller number of trees (or both). It is very easy to build a Random Forest that is just unnecessarily big. This is, of course, problem specific. I had to reduce number of trees and make trees smaller to make model run efficiently in production. In my case, AUC was the same before/after optimisations. This last step of model-tuning is sometimes omitted from tutorials.

Upvotes: 1

Related Questions