Correspondence between PCA principal components and the original variables

Question

I'm want to apply PCA to the kaggle's Titanic dataset

For now I'm just taking the columns that have numeric values and dropping the NaN values, So I have five variables, actually four if we ignore the depending variable ('Survived').

I have this loaded into a DataFrame df, if I took five components using PCA:

pca_model = PCA(n_components=5)
pca_model.fit(df)
pca_model.explained_variance_ratio_

[  9.30197643e-01   6.93699966e-02   2.24377672e-04   1.49076254e-04
   5.89069784e-05]

I got that 93 percent of the variance comes from the first component. Is it possible how can I get this same values from the original variables? E.G. Age -> 0.3 of the variance Fare -> 0.6

Can I now which percentage of the principal component is given by each of the original variables?

Correspondence between PCA principal components and the original variables

Answers (1)

Related Questions