Jeremy
Jeremy

Reputation: 41

Computing Pipeline logistic regression predict_proba in sklearn

I have a dataframe with 3 features and 3 classes that I split into X_train, Y_train, X_test, and Y_test and then run Sklearn's Pipeline with PCA, StandardScaler and finally Logistic Regression. I want to be able to calculate the probabilities directly from the LR weights and the raw data without using predict_proba but don't know how because I'm not sure exactly how pipeline pipes X_test through PCA and StandardScaler into logistic regression. Is this realistic without being able to use PCA's and StandardScaler's fit method?

So far, I have:

pca = PCA(whiten=True)
scaler = StandardScaler()
logistic = LogisticRegression(fit_intercept = True, class_weight = 'balanced', solver = sag, n_jobs = -1, C = 1.0, max_iter = 200)

pipe = Pipeline(steps = [ ('pca', pca), ('scaler', scaler), ('logistic', logistic) ]

pipe.fit(X_train, Y_train)

predict_probs = pipe.predict_proba(X_test)

coefficents = pipe.steps[2][1].coef_ (3 by 30)
intercepts = pipe.steps[2][1].intercept_ (1 by 3)

Upvotes: 0

Views: 5393

Answers (1)

Erik
Erik

Reputation: 21

This is also the question I don't figure out, thanks for Kumar's answer. I regarded pipeline will lead to new transform for x_test, but when I tried to run Pipeline composed of StandardScalar and LogisticRegression, and to run my own defined function using StandardScalar and LogisticRegression, I found that Pipeline actually use the transform fitted by x_train. So don't worry about using pipeline, it's really a convenient and useful tool for machine learning!

Upvotes: 0

Related Questions