Reputation: 10383
I want to use principal component analysis to reduce some noise before applying linear regression.
I have 1000 samples and 200 features
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA
X = np.random.rand(1000,200)
y = np.random.rand(1000,1)
With this data I can train my model:
model.fit(X,y)
But if I try the same after applying PCA
pca = PCA(n_components=8)
pca.fit(X)
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
principal_components = pca.components_
model.fit(principal_components,y)
I get this error:
ValueError: Found input variables with inconsistent numbers of samples: [8, 1000]
Upvotes: 3
Views: 14540
Reputation: 3086
Try this:
pca = PCA(n_components=8)
X_pca = pca.fit_transform(X)
model.fit(X_pca,y)
That is, you simultaneously fit PCA to X and transform it into (1000, 8) array named X_pca. That's what you should use instead of the pca.components_
Upvotes: 10