Charlie Parker
Charlie Parker

Reputation: 5301

Why are my manual PCA reconstructions not matching python's sklearn's reconstructions?

I was trying to check my implementation of PCA to see if I understood it and I tried to do PCA with 12 components on the MNIST data set (which I got using the tensorflow interface that normalized it for me). I obtained the principal components given by sklearn and then made reconstructions as follow:

pca = PCA(n_components=k)
pca = pca.fit(X_train)
X_pca = pca.transform(X_train)
# do manual PCA
U = pca.components_
my_reconstruct = np.dot(  U.T , np.dot(U, X_train.T) ).T

then I used the reconstruction interface given by sklearn to try to reconstruct as follow:

pca = PCA(n_components=k)
pca = pca.fit(X_train)
X_pca = pca.transform(X_train)
X_reconstruct = pca.inverse_transform(X_pca)

and then checked the error as follow (since the rows are a data point and columns features):

print 'X_recon - X_my_reconstruct', (1.0/X_my_reconstruct.shape[0])*LA.norm(X_my_reconstruct - X_reconstruct)**2
#X_recon - X_my_reconstruct 1.47252586279

the error as you can see is non-zero and actually quite noticeable. Why is it? How is their reconstruction different from mine?

Upvotes: 3

Views: 2864

Answers (1)

ali_m
ali_m

Reputation: 74172

I see a couple of issues:

  1. The dot product should be X_pca.dot(pca.components_). PCA factorizes your X_train matrix using SVD:

    Xtrain = U·S·Vᵀ.

    Here, pca.components_ corresponds to Vᵀ (a (k, n_features) matrix), not U (an (n_datapoints, k) matrix).

    The sklearn implementation of PCA is quite readable, and can be found here. I also wrote a pure numpy example in this previous answer.

  2. Did you center X_train by subtracting the mean value for each column before doing the fitting?

    The PCA class automatically centers your data and stores the original mean vector in its .mean_ attribute. If the mean vector for your input features was nonzero then you would need to add the mean to your reconstructions, i.e. my_reconstruct += pca.mean_.

Upvotes: 2

Related Questions