Tuning the Polynomial Feature for Logistic Regression in Python

Question

If I want to incorporate degree two polynomials into my logistic model (which has two predictor variables), such as I have tried below:

df_poly = df[['Y','x0','x1']].copy()
X_train, X_test, Y_train, Y_test = train_test_split(df_poly.drop('Y',axis=1), 
                                                    df_poly['Y'], test_size=0.20, 
                                                    random_state=10)

poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])
pipe.fit(X_train, Y_train)

I would get coefficients for x0, x1, x0^2, x1^2, x0*x1.

Rather, I want to tune this process so I fit for just x0, x1, x0^2 and x0*x1. That is, I want to remove the possibility of the x1^2 term. Is there a way to do this through the sklearn library?

jjsantoso · Accepted Answer

I would use a combination of ColumnTransformer, PolynomialFeatures and FunctionTransformer

import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer

X = pd.DataFrame({'X0': np.arange(10), 'X1': np.arange(10, 20)})
ct = ColumnTransformer([
    ('poly_X0X1', PolynomialFeatures(degree = 2, interaction_only=True, include_bias=False), ['X0', 'X1']),
    ('poly_x0', FunctionTransformer(func=lambda x: x**2), ['X0']),
]
)
poly = ct.fit_transform(X)
poly # X0, X1, X0*X1, X0^2

array([[  0.,  10.,   0.,   0.],
       [  1.,  11.,  11.,   1.],
       [  2.,  12.,  24.,   4.],
       [  3.,  13.,  39.,   9.],
       [  4.,  14.,  56.,  16.],
       [  5.,  15.,  75.,  25.],
       [  6.,  16.,  96.,  36.],
       [  7.,  17., 119.,  49.],
       [  8.,  18., 144.,  64.],
       [  9.,  19., 171.,  81.]])

Tuning the Polynomial Feature for Logistic Regression in Python

Answers (1)

Related Questions