Reputation: 1470
I want the correlations between individual variables and principal components in python. I am using PCA in sklearn. I don't understand how can I achieve the loading matrix after I have decomposed my data? My code is here.
iris = load_iris()
data, y = iris.data, iris.target
pca = PCA(n_components=2)
transformed_data = pca.fit(data).transform(data)
eigenValues = pca.explained_variance_ratio_
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html doesn't mention how this can be achieved.
Upvotes: 32
Views: 41931
Reputation: 40878
I think that @RickardSjogren is describing the eigenvectors, while @BigPanda is giving the loadings. There's a big difference: Loadings vs eigenvectors in PCA: when to use one or another?.
I created this PCA class with a loadings
method.
Loadings, as given by pca.components_ * np.sqrt(pca.explained_variance_)
, are more analogous to coefficients in a multiple linear regression. I don't use .T
here because in the PCA class linked above, the components are already transposed. numpy.linalg.svd
produces u, s, and vt
, where vt
is the Hermetian transpose, so you first need to back into v
with vt.T
.
There is also one other important detail: the signs (positive/negative) on the components and loadings in sklearn.PCA
may differ from packages such as R.
More on that here:
In sklearn.decomposition.PCA, why are components_ negative?.
Upvotes: 20
Reputation: 327
Multiply each component by the square root of its corresponding eigenvalue:
pca.components_.T * np.sqrt(pca.explained_variance_)
This should produce your loading matrix.
Upvotes: 25
Reputation: 4238
According to this blog the rows of pca.components_
are the loading vectors. So:
loadings = pca.components_
Upvotes: 12