NicoA
NicoA

Reputation: 33

Python Flask app runs locally, but returns AttributeError when hosted on Heroku

I am developing an application for university. The web app loads a given model with joblib, and in order to work it uses the class FlexibleScaler:

flexible.py

from sklearn.preprocessing import MinMaxScaler, StandardScaler, PowerTransformer, MaxAbsScaler, RobustScaler, Normalizer
from sklearn.base import BaseEstimator, TransformerMixin

class FlexibleScaler(BaseEstimator, TransformerMixin):
    def __init__(self, scaler=None):
        self.scaler = scaler
        self.check = False


    def __assign_scaler(self):
        if self.scaler == 'min-max':
            self.method = MinMaxScaler()
        elif self.scaler == 'standard':
            self.method = StandardScaler()
        elif self.scaler == 'yeo-johnson':
            self.method = PowerTransformer(method='yeo-johnson')
        elif self.scaler == 'box-cox':
            self.method = PowerTransformer(method='box-cox')
        elif self.scaler == 'max-abs':
            self.method = MaxAbsScaler()
        elif self.scaler == 'robust':
            self.method = RobustScaler()
        elif self.scaler == 'normalize':
            self.method = Normalizer()
        else:
            self.method = None
        self.check = True

    def fit_transform(self, X, y=None, **fit_params):
        if not self.check:
            self.__assign_scaler()
        if self.method is None:
            return X
        return self.method.fit_transform(X, y, **fit_params)

    def fit(self, X):
        if not self.check:
            self.__assign_scaler()
        if self.method is None:
            return X
        self.method.fit(X)

    def transform(self, X):
        if not self.check:
            self.__assign_scaler()
        if self.method is None:
            return X
        return self.method.transform(X)

flask_start.py

from flask import Flask, Response, render_template, request, flash, redirect, session, g
import joblib
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler, StandardScaler, PowerTransformer, MaxAbsScaler, RobustScaler, Normalizer
from sklearn.base import BaseEstimator, TransformerMixin
from flexible import FlexibleScaler


UPLOAD_FOLDER = '/tmp/'
ALLOWED_EXTENSIONS = {'csv'}
app = Flask(__name__)

app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['ALLOWED_EXTENSIONS'] = ALLOWED_EXTENSIONS


@app.route("/", methods=["POST", "GET"])
def home():

    if request.method == 'POST':

        //get data to process

        clf = joblib.load('ENS_fitted.joblib')

        prediction = clf.predict(features)
        pred_prob = clf.predict_proba(features)

        //do operations and return template

if __name__ == "__main__":
    app.run(debug = True)

It all works locally. As soon as I deploy on Heroku, I get the following error on the joblib.load():

Traceback (most recent call last):

2020-09-24T21:27:30.117559+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app

2020-09-24T21:27:30.117559+00:00 app[web.1]:     response = self.full_dispatch_request()

2020-09-24T21:27:30.117560+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request

2020-09-24T21:27:30.117560+00:00 app[web.1]:     rv = self.handle_user_exception(e)

2020-09-24T21:27:30.117561+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception

2020-09-24T21:27:30.117561+00:00 app[web.1]:     reraise(exc_type, exc_value, tb)

2020-09-24T21:27:30.117561+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise

2020-09-24T21:27:30.117562+00:00 app[web.1]:     raise value

2020-09-24T21:27:30.117563+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request

2020-09-24T21:27:30.117563+00:00 app[web.1]:     rv = self.dispatch_request()

2020-09-24T21:27:30.117563+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request

2020-09-24T21:27:30.117564+00:00 app[web.1]:     return self.view_functions[rule.endpoint](**req.view_args)

2020-09-24T21:27:30.117564+00:00 app[web.1]:   File "/app/flask_start.py", line 138, in home

2020-09-24T21:27:30.117564+00:00 app[web.1]:     clf = joblib.load('ENS_fitted.joblib')

2020-09-24T21:27:30.117565+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 585, in load

2020-09-24T21:27:30.117565+00:00 app[web.1]:     obj = _unpickle(fobj, filename, mmap_mode)

2020-09-24T21:27:30.117566+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle

2020-09-24T21:27:30.117566+00:00 app[web.1]:     obj = unpickler.load()

2020-09-24T21:27:30.117567+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/pickle.py", line 1210, in load

2020-09-24T21:27:30.117570+00:00 app[web.1]:     dispatch[key[0]](self)

2020-09-24T21:27:30.117570+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/pickle.py", line 1526, in load_global

2020-09-24T21:27:30.117570+00:00 app[web.1]:     klass = self.find_class(module, name)

2020-09-24T21:27:30.117571+00:00 app[web.1]:   File "/app/.heroku/python/lib/python3.8/pickle.py", line 1581, in find_class

2020-09-24T21:27:30.117571+00:00 app[web.1]:     return getattr(sys.modules[module], name)

AttributeError: module '__main__' has no attribute 'FlexibleScaler'

I can't get why this happens. The import is there and works locally. I tried to copy the class FlexibleScaler directly into the flask_start.py (which also works locally), but no luck.

The only thing that changes between local and Heroku is that on Heroku I use gunicorn to start the app.

Please, any help would be appreciated.

Upvotes: 3

Views: 817

Answers (2)

Dave W. Smith
Dave W. Smith

Reputation: 24966

It appears that the joblib.save() that produced ENS_fitted.joblib happened in flask_start.py was run directly from python. When that's the case, flask_start will have a __name__ of "__main__". Then, when joblib.save() pickles, it will save the FlexibleScaler instance as a __main__.FlexibleScaler.

But when you deploy and run under gunicorn, flask_start will have a __name__ of "flask_start". This confounds joblib.load(), which expects to find a __main__.FlexibleScaler, and gives up as you've shown above.

A solution to this is to regenerate your saved model, but this time by invoking flask_start via

% FLASK_APP=flask_start flask run

then joblib.save() again, then re-deploy.

Updated

If you're absolutely unable to get the model regenerated, you can try this hack. After the imports in flask_start.py, add

import __main__
__main__.FlexibleScalar = FlexibleScalar

You'll either be able to joblib.load() the model, or you'll run into a similar error with another class, in which case, repeat this trick.

Upvotes: 3

ChargingBull
ChargingBull

Reputation: 15

I would try putting

clf = joblib.load('ENS_fitted.joblib')

into a try-except block to see if the exception there is the same as what you're getting at

AttributeError: module '__main__' has no attribute 'FlexibleScaler'

For Example:

try:

    clf = joblib.load('ENS_fitted.joblib')
    prediction = clf.predict(features)
    pred_prob = clf.predict_proba(features)

except Exception as e:

    print(f"Exception: {e}")

Also, I recommend making sure that gunicorn is calling the program as name == "main" by printing out a notice to yourself;

if __name__ == "__main__":
    print("__name__ is __main__")
    app.run(debug = True)

If you can't get past the error after doing this, I would look into configuring gunicorn with flask.

Upvotes: 2

Related Questions