Reputation: 866
I have a Pandas data frame with 20 numeric features and a numeric response column. I would like to first apply PCA to bring the dimensionality down to 10 and then run Linear Regression to predict the numeric response. I can do this currently using two steps
pipeline = Pipeline([('scaling', StandardScaler()),
('pca', PCA(n_components=20, whiten=True))])
newDF = pipeline.fit_transform(numericDF)
Y = df["Response"]
model = LinearRegression()
model.fit(newDF, Y)
Is there a way to combine Linear Regression in the above pipeline? I ask this question because
fit_transform
is not supported in Linear Regression.fit_predict
can't be used with PCA.How could I run PCA and then Linear Regression all in the same pipeline?
Upvotes: 1
Views: 4289
Reputation: 5367
The order of the pipeline steps matters.
The last step might implement predict()
, while all the previous must have fit_transform()
.
Also logically, you first transform your features and then apply a predictive classification/regression model
Y = df["Response"]
X=...
pipeline = Pipeline([('scaling', StandardScaler()),
('pca', PCA(n_components=20, whiten=True)),
('regr',LinearRegression())])
newDF = pipeline.fit_predict(numericDF)
Upvotes: 6