Reputation: 41
I have a dataframe with 3 features and 3 classes that I split into X_train, Y_train, X_test, and Y_test and then run Sklearn's Pipeline with PCA, StandardScaler and finally Logistic Regression. I want to be able to calculate the probabilities directly from the LR weights and the raw data without using predict_proba but don't know how because I'm not sure exactly how pipeline pipes X_test through PCA and StandardScaler into logistic regression. Is this realistic without being able to use PCA's and StandardScaler's fit method?
So far, I have:
pca = PCA(whiten=True)
scaler = StandardScaler()
logistic = LogisticRegression(fit_intercept = True, class_weight = 'balanced', solver = sag, n_jobs = -1, C = 1.0, max_iter = 200)
pipe = Pipeline(steps = [ ('pca', pca), ('scaler', scaler), ('logistic', logistic) ]
pipe.fit(X_train, Y_train)
predict_probs = pipe.predict_proba(X_test)
coefficents = pipe.steps[2][1].coef_ (3 by 30)
intercepts = pipe.steps[2][1].intercept_ (1 by 3)
Upvotes: 0
Views: 5393
Reputation: 21
This is also the question I don't figure out, thanks for Kumar's answer. I regarded pipeline will lead to new transform for x_test, but when I tried to run Pipeline composed of StandardScalar and LogisticRegression, and to run my own defined function using StandardScalar and LogisticRegression, I found that Pipeline actually use the transform fitted by x_train. So don't worry about using pipeline, it's really a convenient and useful tool for machine learning!
Upvotes: 0