Reputation: 169
I have got a data set with 68 dimensions * 100 observations to create a pca space using matplotlib in python.
Now I have got another set of data (x) with 6 dimensions * 100 observations. Is it possible to project these data onto the pca space created before?
I tried to use pca_space.project(x) to project, but the dimensions of the pca space does not match the new data set, it doesn't work.
Upvotes: 1
Views: 3451
Reputation:
Given 2 random multivariate gaussians:
mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20).T
assert class1_sample.shape == (3,20), "The matrix has not the dimensions 3x20"
mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20).T
assert class1_sample.shape == (3,20), "The matrix has not the dimensions 3x20"
One example for reducing the 3D space onto a 2D feature subspace using matplotlib would be
from matplotlib.mlab import PCA as mlabPCA
mlab_pca = mlabPCA(all_samples.T)
print('PC axes in terms of the measurement axes'\
' scaled by the standard deviations:\n',\
mlab_pca.Wt)
plt.plot(mlab_pca.Y[0:20,0],mlab_pca.Y[0:20,1], 'o', markersize=7,\
color='blue', alpha=0.5, label='class1')
plt.plot(mlab_pca.Y[20:40,0], mlab_pca.Y[20:40,1], '^', markersize=7,\
color='red', alpha=0.5, label='class2')
plt.xlabel('x_values')
plt.ylabel('y_values')
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.legend()
plt.title('Transformed samples with class labels from matplotlib.mlab.PCA()')
plt.show()
But note that mlabPCA
scales the variables to unit variance (standardization). This may or may not be what you want. If you want more control, I suggest using the sckit-learn package:
from sklearn.decomposition import PCA as sklearnPCA
sklearn_pca = sklearnPCA(n_components=2)
sklearn_transf = sklearn_pca.fit_transform(all_samples.T)
plt.plot(sklearn_transf[0:20,0],sklearn_transf[0:20,1],\
'o', markersize=7, color='blue', alpha=0.5, label='class1')
plt.plot(sklearn_transf[20:40,0], sklearn_transf[20:40,1],\
'^', markersize=7, color='red', alpha=0.5, label='class2')
plt.xlabel('x_values')
plt.ylabel('y_values')
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.legend()
plt.title('Transformed samples with class labels from matplotlib.mlab.PCA()')
plt.show()
Upvotes: 1