rgk
rgk

Reputation: 866

Using sklearn Linear Regression and PCA in a single Pipeline

I have a Pandas data frame with 20 numeric features and a numeric response column. I would like to first apply PCA to bring the dimensionality down to 10 and then run Linear Regression to predict the numeric response. I can do this currently using two steps

pipeline = Pipeline([('scaling', StandardScaler()),
                     ('pca', PCA(n_components=20, whiten=True))])
newDF = pipeline.fit_transform(numericDF)

Y = df["Response"]
model = LinearRegression()
model.fit(newDF, Y)

Is there a way to combine Linear Regression in the above pipeline? I ask this question because

  1. fit_transform is not supported in Linear Regression.
  2. fit_predict can't be used with PCA.
  3. It's not a one-off use case

How could I run PCA and then Linear Regression all in the same pipeline?

Upvotes: 1

Views: 4289

Answers (1)

00__00__00
00__00__00

Reputation: 5367

The order of the pipeline steps matters. The last step might implement predict(), while all the previous must have fit_transform(). Also logically, you first transform your features and then apply a predictive classification/regression model

Y = df["Response"]
X=...
pipeline = Pipeline([('scaling', StandardScaler()),
                     ('pca', PCA(n_components=20, whiten=True)),
                      ('regr',LinearRegression())])
newDF = pipeline.fit_predict(numericDF)

Upvotes: 6

Related Questions