Reputation: 23
If I want to incorporate degree two polynomials into my logistic model (which has two predictor variables), such as I have tried below:
df_poly = df[['Y','x0','x1']].copy()
X_train, X_test, Y_train, Y_test = train_test_split(df_poly.drop('Y',axis=1),
df_poly['Y'], test_size=0.20,
random_state=10)
poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])
pipe.fit(X_train, Y_train)
I would get coefficients for x0, x1, x0^2, x1^2, x0*x1.
Rather, I want to tune this process so I fit for just x0, x1, x0^2 and x0*x1. That is, I want to remove the possibility of the x1^2 term. Is there a way to do this through the sklearn library?
Upvotes: 0
Views: 737
Reputation: 1758
I would use a combination of ColumnTransformer
, PolynomialFeatures
and FunctionTransformer
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer
X = pd.DataFrame({'X0': np.arange(10), 'X1': np.arange(10, 20)})
ct = ColumnTransformer([
('poly_X0X1', PolynomialFeatures(degree = 2, interaction_only=True, include_bias=False), ['X0', 'X1']),
('poly_x0', FunctionTransformer(func=lambda x: x**2), ['X0']),
]
)
poly = ct.fit_transform(X)
poly # X0, X1, X0*X1, X0^2
array([[ 0., 10., 0., 0.],
[ 1., 11., 11., 1.],
[ 2., 12., 24., 4.],
[ 3., 13., 39., 9.],
[ 4., 14., 56., 16.],
[ 5., 15., 75., 25.],
[ 6., 16., 96., 36.],
[ 7., 17., 119., 49.],
[ 8., 18., 144., 64.],
[ 9., 19., 171., 81.]])
Upvotes: 1