Reputation: 10433
I'm want to apply PCA to the kaggle's Titanic dataset
For now I'm just taking the columns that have numeric values and dropping the NaN values, So I have five variables, actually four if we ignore the depending variable ('Survived').
I have this loaded into a DataFrame df, if I took five components using PCA:
pca_model = PCA(n_components=5)
pca_model.fit(df)
pca_model.explained_variance_ratio_
[ 9.30197643e-01 6.93699966e-02 2.24377672e-04 1.49076254e-04
5.89069784e-05]
I got that 93 percent of the variance comes from the first component. Is it possible how can I get this same values from the original variables? E.G. Age -> 0.3 of the variance Fare -> 0.6
Can I now which percentage of the principal component is given by each of the original variables?
Upvotes: 0
Views: 487
Reputation: 2111
Each component of the PCA is a linear combination of all of the original variables. You can observe the role of each original variable in different PCA components using pca_model.components_
.
Upvotes: 1