Reputation: 15675
I am using sklearn's PCA for dimensionality reduction on a large set of images. Once the PCA is fitted, I would like to see what the components look like.
One can do so by looking at the components_
attribute. Not realizing that was available, I did something else instead:
each_component = np.eye(total_components)
component_im_array = pca.inverse_transform(each_component)
for i in range(num_components):
component_im = component_im_array[i, :].reshape(height, width)
# do something with component_im
In other words, I create an image in the PCA space that has all features but 1 set to 0. By inversely transforming them, I should then get the image in the original space which, once transformed, can be expressed solely with that PCA component.
The following image shows the results. On the left is the component calculated using my method. On the right is pca.components_[i]
directly. Additionally, with my method, most images are very similar (but they are different) while by accessing the components_
the images are very different as I would have expected
Is there a conceptual problem in my method? Clearly the components from pca.components_[i]
are correct (or at least more correct) than the ones I'm getting. Thanks!
Upvotes: 4
Views: 16079
Reputation: 363547
The difference between grabbing the components_
and doing an inverse_transform
on the identity matrix is that the latter adds in the empirical mean of each feature. I.e.:
def inverse_transform(self, X):
return np.dot(X, self.components_) + self.mean_
where self.mean_
was estimated from the training set.
Upvotes: 6
Reputation: 1276
Components and inverse transform are two different things. The inverse transform maps the components back to the original image space
#Create a PCA model with two principal components
pca = PCA(2)
pca.fit(data)
#Get the components from transforming the original data.
scores = pca.transform(data)
# Reconstruct from the 2 dimensional scores
reconstruct = pca.inverse_transform(scores )
#The residual is the amount not explained by the first two components
residual=data-reconstruct
Thus you are inverse transforming the original data and not the components, and thus they are completely different. You almost never inverse_transform the orginal data. pca.components_ are the actual vectors representing the underlying axis used to project the data to the pca space.
Upvotes: 6