Correspondence between PCA principal components and the original variables

I'm want to apply PCA to the kaggle's Titanic dataset

For now I'm just taking the columns that have numeric values and dropping the NaN values, So I have five variables, actually four if we ignore the depending variable ('Survived').

enter image description here

I have this loaded into a DataFrame df, if I took five components using PCA:

pca_model = PCA(n_components=5)
pca_model.fit(df)
pca_model.explained_variance_ratio_

[  9.30197643e-01   6.93699966e-02   2.24377672e-04   1.49076254e-04
   5.89069784e-05]

I got that 93 percent of the variance comes from the first component. Is it possible how can I get this same values from the original variables? E.G. Age -> 0.3 of the variance Fare -> 0.6

Can I now which percentage of the principal component is given by each of the original variables?

Upvotes: 0

Views: 487

Answers (1)

Hossein
Hossein

Reputation: 2111

Each component of the PCA is a linear combination of all of the original variables. You can observe the role of each original variable in different PCA components using pca_model.components_.

Upvotes: 1

Related Questions