Dumb chimp
Dumb chimp

Reputation: 454

AttributeError: 'LinearRegression' object has no attribute 'predict_proba'

I need to create a custom transformer to be input into a grader.

The grader passes a list of dictionaries to the predict or predict_proba method of my estimator, not a DataFrame. This means that the model must work with both data types. For this reason, I need to provide a custom ColumnSelectTransformer to use instead scikit-learn's own ColumnTransformer.

This is my code for the custom transformer that aims to impute null values in the columns provided.

from sklearn.impute import SimpleImputer

simple_cols = ['BEDCERT', 'RESTOT', 'INHOSP', 'CCRC_FACIL', 'SFF', 'CHOW_LAST_12MOS', 'SPRINKLER_STATUS', 'EXP_TOTAL', 'ADJ_TOTAL']

class ColumnSelectTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)
        return X[self.columns].values
        
simple_features = Pipeline([
    ('cst', ColumnSelectTransformer(simple_cols)),
    ('imputer', SimpleImputer(strategy='mean')),
])

I am then tasked to create a new pipeline and fit it with an estimator, and below is my attempt.

from sklearn.linear_model import LinearRegression

simple_features_model = Pipeline([
    ('simple', simple_features),
    ('linear', LinearRegression()),
])

simple_features_model.fit(data, fine_counts > 0)

The pipeline is generated successfully

Pipeline(memory=None,
         steps=[('simple',
                 Pipeline(memory=None,
                          steps=[('cst',
                                  ColumnSelectTransformer(columns=['BEDCERT',
                                                                   'RESTOT',
                                                                   'INHOSP',
                                                                   'CCRC_FACIL',
                                                                   'SFF',
                                                                   'CHOW_LAST_12MOS',
                                                                   'SPRINKLER_STATUS',
                                                                   'EXP_TOTAL',
                                                                   'ADJ_TOTAL'])),
                                 ('imputer',
                                  SimpleImputer(add_indicator=False, copy=True,
                                                fill_value=None,
                                                missing_values=nan,
                                                strategy='mean', verbose=0))],
                          verbose=False)),
                ('linear',
                 LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
                                  normalize=False))],
         verbose=False)

However, when I pass my simple_features_model into my school's grader

def positive_probability(model):
    def predict_proba(X):
        return model.predict_proba(X)[:, 1]
    return predict_proba

grader.score.ml__simple_features(positive_probability(simple_features_model))

I get the following error

AttributeError                            Traceback (most recent call last)
<ipython-input-87-243f592b48ee> in <module>()
      4     return predict_proba
      5 
----> 6 grader.score.ml__simple_features(positive_probability(simple_features_model))

/opt/conda/lib/python3.7/site-packages/static_grader/grader.py in func(*args, **kw)
     92   def __getattr__(self, method):
     93     def func(*args, **kw):
---> 94       return self(method, *args, **kw)
     95     return func
     96 

/opt/conda/lib/python3.7/site-packages/static_grader/grader.py in __call__(self, question_name, func)
     88       return
     89     test_cases = json.loads(resp.text)
---> 90     test_cases_grading(question_name, func, test_cases)
     91 
     92   def __getattr__(self, method):

/opt/conda/lib/python3.7/site-packages/static_grader/grader.py in test_cases_grading(question_name, func, test_cases)
     40   for test_case in test_cases:
     41     if inspect.isroutine(func):
---> 42       sub_res = func(*test_case['args'], **test_case['kwargs'])
     43     elif not test_case['args'] and not test_case['kwargs']:
     44       sub_res = func

<ipython-input-87-243f592b48ee> in predict_proba(X)
      1 def positive_probability(model):
      2     def predict_proba(X):
----> 3         return model.predict_proba(X)[:, 1]
      4     return predict_proba
      5 

/opt/conda/lib/python3.7/site-packages/sklearn/utils/metaestimators.py in __get__(self, obj, type)
    108                     continue
    109                 else:
--> 110                     getattr(delegate, self.attribute_name)
    111                     break
    112             else:

AttributeError: 'LinearRegression' object has no attribute 'predict_proba'

Upvotes: 5

Views: 15226

Answers (1)

desertnaut
desertnaut

Reputation: 60321

The linear regression module indeed does not have a predict_proba attribute (check the docs) for a very simple reason: probability estimations are only for classification models, and not for regression (i.e. numeric prediction) ones, such as linear regression.

Since it is not clear from your post if you are trying to do regression or classification:

  • If you are in a regression setting, just replace predict_proba with predict.
  • If you are in a classification setting, you cannot use linear regression - try logistic regression instead (despite the name, it is a classification algorithm), which does indeed have a predict_proba attribute (again, see the docs).

Upvotes: 9

Related Questions