Reputation: 33
I am developing an application for university. The web app loads a given model with joblib, and in order to work it uses the class FlexibleScaler:
flexible.py
from sklearn.preprocessing import MinMaxScaler, StandardScaler, PowerTransformer, MaxAbsScaler, RobustScaler, Normalizer
from sklearn.base import BaseEstimator, TransformerMixin
class FlexibleScaler(BaseEstimator, TransformerMixin):
def __init__(self, scaler=None):
self.scaler = scaler
self.check = False
def __assign_scaler(self):
if self.scaler == 'min-max':
self.method = MinMaxScaler()
elif self.scaler == 'standard':
self.method = StandardScaler()
elif self.scaler == 'yeo-johnson':
self.method = PowerTransformer(method='yeo-johnson')
elif self.scaler == 'box-cox':
self.method = PowerTransformer(method='box-cox')
elif self.scaler == 'max-abs':
self.method = MaxAbsScaler()
elif self.scaler == 'robust':
self.method = RobustScaler()
elif self.scaler == 'normalize':
self.method = Normalizer()
else:
self.method = None
self.check = True
def fit_transform(self, X, y=None, **fit_params):
if not self.check:
self.__assign_scaler()
if self.method is None:
return X
return self.method.fit_transform(X, y, **fit_params)
def fit(self, X):
if not self.check:
self.__assign_scaler()
if self.method is None:
return X
self.method.fit(X)
def transform(self, X):
if not self.check:
self.__assign_scaler()
if self.method is None:
return X
return self.method.transform(X)
flask_start.py
from flask import Flask, Response, render_template, request, flash, redirect, session, g
import joblib
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler, StandardScaler, PowerTransformer, MaxAbsScaler, RobustScaler, Normalizer
from sklearn.base import BaseEstimator, TransformerMixin
from flexible import FlexibleScaler
UPLOAD_FOLDER = '/tmp/'
ALLOWED_EXTENSIONS = {'csv'}
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['ALLOWED_EXTENSIONS'] = ALLOWED_EXTENSIONS
@app.route("/", methods=["POST", "GET"])
def home():
if request.method == 'POST':
//get data to process
clf = joblib.load('ENS_fitted.joblib')
prediction = clf.predict(features)
pred_prob = clf.predict_proba(features)
//do operations and return template
if __name__ == "__main__":
app.run(debug = True)
It all works locally. As soon as I deploy on Heroku, I get the following error on the joblib.load():
Traceback (most recent call last):
2020-09-24T21:27:30.117559+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
2020-09-24T21:27:30.117559+00:00 app[web.1]: response = self.full_dispatch_request()
2020-09-24T21:27:30.117560+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
2020-09-24T21:27:30.117560+00:00 app[web.1]: rv = self.handle_user_exception(e)
2020-09-24T21:27:30.117561+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
2020-09-24T21:27:30.117561+00:00 app[web.1]: reraise(exc_type, exc_value, tb)
2020-09-24T21:27:30.117561+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
2020-09-24T21:27:30.117562+00:00 app[web.1]: raise value
2020-09-24T21:27:30.117563+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
2020-09-24T21:27:30.117563+00:00 app[web.1]: rv = self.dispatch_request()
2020-09-24T21:27:30.117563+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
2020-09-24T21:27:30.117564+00:00 app[web.1]: return self.view_functions[rule.endpoint](**req.view_args)
2020-09-24T21:27:30.117564+00:00 app[web.1]: File "/app/flask_start.py", line 138, in home
2020-09-24T21:27:30.117564+00:00 app[web.1]: clf = joblib.load('ENS_fitted.joblib')
2020-09-24T21:27:30.117565+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 585, in load
2020-09-24T21:27:30.117565+00:00 app[web.1]: obj = _unpickle(fobj, filename, mmap_mode)
2020-09-24T21:27:30.117566+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
2020-09-24T21:27:30.117566+00:00 app[web.1]: obj = unpickler.load()
2020-09-24T21:27:30.117567+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/pickle.py", line 1210, in load
2020-09-24T21:27:30.117570+00:00 app[web.1]: dispatch[key[0]](self)
2020-09-24T21:27:30.117570+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/pickle.py", line 1526, in load_global
2020-09-24T21:27:30.117570+00:00 app[web.1]: klass = self.find_class(module, name)
2020-09-24T21:27:30.117571+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.8/pickle.py", line 1581, in find_class
2020-09-24T21:27:30.117571+00:00 app[web.1]: return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'FlexibleScaler'
I can't get why this happens. The import is there and works locally. I tried to copy the class FlexibleScaler directly into the flask_start.py (which also works locally), but no luck.
The only thing that changes between local and Heroku is that on Heroku I use gunicorn to start the app.
Please, any help would be appreciated.
Upvotes: 3
Views: 817
Reputation: 24966
It appears that the joblib.save()
that produced ENS_fitted.joblib
happened in flask_start.py
was run directly from python. When that's the case, flask_start
will have a __name__
of "__main__"
. Then, when joblib.save()
pickles, it will save the FlexibleScaler instance as a __main__.FlexibleScaler
.
But when you deploy and run under gunicorn, flask_start
will have a __name__
of "flask_start"
. This confounds joblib.load()
, which expects to find a __main__.FlexibleScaler
, and gives up as you've shown above.
A solution to this is to regenerate your saved model, but this time by invoking flask_start
via
% FLASK_APP=flask_start flask run
then joblib.save()
again, then re-deploy.
Updated
If you're absolutely unable to get the model regenerated, you can try this hack. After the imports in flask_start.py
, add
import __main__
__main__.FlexibleScalar = FlexibleScalar
You'll either be able to joblib.load()
the model, or you'll run into a similar error with another class, in which case, repeat this trick.
Upvotes: 3
Reputation: 15
I would try putting
clf = joblib.load('ENS_fitted.joblib')
into a try-except block to see if the exception there is the same as what you're getting at
AttributeError: module '__main__' has no attribute 'FlexibleScaler'
For Example:
try:
clf = joblib.load('ENS_fitted.joblib')
prediction = clf.predict(features)
pred_prob = clf.predict_proba(features)
except Exception as e:
print(f"Exception: {e}")
Also, I recommend making sure that gunicorn is calling the program as name == "main" by printing out a notice to yourself;
if __name__ == "__main__":
print("__name__ is __main__")
app.run(debug = True)
If you can't get past the error after doing this, I would look into configuring gunicorn with flask.
Upvotes: 2