Reputation: 311
This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA.
I'm doing a simple principal component analysis with sklearn. As I understand it, the implementation should take care of (1) centering the data when creating components and (2) de-centering the data after transformation. However, after transforming the data it is still centered. How can I project the data to a lower dimensional space while preserving the characteristics of the original data? Given that I would do dimensionality reduction on high dimensional data, I wouldn't have the appropriate mean for each principal component, how can that be derived?
Reducing 3 dimensions to 2 dimensions:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -3], [1, 1, 1], [2, 1, 2], [3, 2, 3]]) + 3
X.shape
(6, 3)
fig = plt.figure(figsize=(10, 8), dpi= 80, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0], X[:,1],X[:,2], '*')
plt.title('original')
plt.show()
PCA with 2 components:
pca = PCA(n_components=2)
pca.fit(X)
X_trans =pca.transform(X)
X_trans.shape
(6, 2)
plt.plot(X_trans[:,0], X_trans[:,1], '*')
plt.show()
What I would like to do at this stage is to "restore" my data in this lower dimension, such that the value of the data points correspond to the original data. It should still only have 2 dimensions, but not be centered around the mean.
Performing inverse transform, as suggested below, actually brings me back to 3 dimensions
X_approx = pca.inverse_transform(X_trans)
X_approx.shape
(6, 3)
I want to remain in 2 dimensions but still have my data as resemble it's original form as closely as possible and not be centered around the mean.
Upvotes: 3
Views: 1497
Reputation: 39052
You are just fitting the data and plotting the transformed data. To get the original data back in a lower dimension, you need to use
inverse_transform
which gives you the original data back as I show below in the plot. From the docs:
inverse_transform(X)
Transform data back to its original space.
pca = PCA(n_components=2)
pca.fit(X)
X_trans =pca.transform(X)
X_original = pca.inverse_transform(X_trans)
plt.plot(X_original[:,0], X_original[:,1], 'r*')
Upvotes: 3