Reputation: 1163
Does sklearn PCA consider the columns of the dataframe as the vectors to reduce or the rows as vectors to reduce ?
Because when doing this:
df=pd.DataFrame([[1,-21,45,3,4],[4,5,89,-5,6],[7,-4,58,1,19],[10,11,74,20,12],[13,14,15,45,78]]) #5 rows 5 columns
pca=PCA(n_components=3)
pca.fit(df)
df_pcs=pd.DataFrame(data=pca.components_, index = df.index)
I get the following error:
ValueError: Shape of passed values is (5, 3), indices imply (5, 5)
Upvotes: 0
Views: 1196
Reputation: 36599
Rows represent samples and columns represent features. PCA reduces the dimensionality of the data, ie features. So columns.
So if you are talking about vectors, then it considers a row as single feature vector and reduces its size.
If you have a dataframe of shape say [100, 6]
and PCA n_components
is set to 3. So your output will be [100, 3]
.
# You need this
df_pcs=pca.transform(df)
# This produces error because shapes dont match.
df_pcs=pd.DataFrame(data=pca.components_, index = df.index)
pca.components_
is an array of [3,5] and your index
parameter is using the df.index
which is of shape [5,]
. Hence the error. pca.components_
represents a completely different thing.
According to documentation:-
components_ : array, [n_components, n_features]
Principal axes in feature space, representing the directions of maximum variance in the data.
Upvotes: 3