Mountain_sheep
Mountain_sheep

Reputation: 311

Sklearn PCA, how to restore mean in lower dimension?

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA.

I'm doing a simple principal component analysis with sklearn. As I understand it, the implementation should take care of (1) centering the data when creating components and (2) de-centering the data after transformation. However, after transforming the data it is still centered. How can I project the data to a lower dimensional space while preserving the characteristics of the original data? Given that I would do dimensionality reduction on high dimensional data, I wouldn't have the appropriate mean for each principal component, how can that be derived?

Reducing 3 dimensions to 2 dimensions:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -3], [1, 1, 1], [2, 1, 2], [3, 2, 3]]) + 3
X.shape

(6, 3)

fig = plt.figure(figsize=(10, 8), dpi= 80, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0], X[:,1],X[:,2], '*')
plt.title('original')
plt.show()

enter image description here

PCA with 2 components:

pca = PCA(n_components=2)
pca.fit(X)
X_trans =pca.transform(X)
X_trans.shape

(6, 2)

plt.plot(X_trans[:,0], X_trans[:,1], '*')
plt.show()

enter image description here

What I would like to do at this stage is to "restore" my data in this lower dimension, such that the value of the data points correspond to the original data. It should still only have 2 dimensions, but not be centered around the mean.

Performing inverse transform, as suggested below, actually brings me back to 3 dimensions

X_approx = pca.inverse_transform(X_trans) 
X_approx.shape

(6, 3)

I want to remain in 2 dimensions but still have my data as resemble it's original form as closely as possible and not be centered around the mean.

Upvotes: 3

Views: 1497

Answers (1)

Sheldore
Sheldore

Reputation: 39052

You are just fitting the data and plotting the transformed data. To get the original data back in a lower dimension, you need to use inverse_transform which gives you the original data back as I show below in the plot. From the docs:

inverse_transform(X)

Transform data back to its original space.

pca = PCA(n_components=2)
pca.fit(X)

X_trans =pca.transform(X)
X_original = pca.inverse_transform(X_trans)
plt.plot(X_original[:,0], X_original[:,1], 'r*')

enter image description here

Upvotes: 3

Related Questions