Reputation: 10433
I'm using sklearn to do PCA, I'm testing the functions with some dummy data, when I have more samples than the number of components I want to use it works just fine:
from sklearn.decomposition import PCA
import numpy as np
features_training = np.random.rand(10,30)
components = 8
pca = PCA(n_components=int(components))
X_pca = pca.fit_transform(features_training)
From the code above I get a 10*8 matrix.
X_pca.shape
(10, 8)
But for the same data, if I try to keep 15 components:
features_training = np.random.rand(10,30)
components = 15
pca = PCA(n_components=int(components))
X_pca = pca.fit_transform(features_training)
I don't get a 10*15 matrix but a 10*10 one.
X_pca.shape
(10, 10)
So it seems that the number of components is limited not only by the number of features but for the number of samples. Why is that?
Upvotes: 0
Views: 549
Reputation: 36619
I cannot tell you about how actually the PCA works. But in the Scikit-learn documentation for PCA, it is mentioned that actual n_components = min(n_samples, specified n_components)
Upvotes: 1