Ravi Biradar
Ravi Biradar

Reputation: 61

After choosing K-components in PCA how do we find out which components(names of the columns) have algorithm selected?

I am new to Data Science and I need some help to understand PCA.I know that each of columns constitute one axis,but when PCA is done and components are reduced to some k value,How to know which all columns got selected?

Upvotes: 2

Views: 5988

Answers (2)

Alperen Tahta
Alperen Tahta

Reputation: 476

In PCA you compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components.
Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables. These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components. So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information in the first component, then maximum remaining information in the second and so on.

Geometrically speaking, principal components represent the directions of the data that explain a maximal amount of variance, that is to say, the lines that capture most information of the data. s there are as many principal components as there are variables in the data, principal components are constructed in such a manner that the first principal component accounts for the largest possible variance in the data set.

According to my experience, if the percentage of cumulative sum of Eigen values can over 80% or 90%, the transformed vectors will be enough to represent the old vectors.

To explain clearly lets use @Nicholas M's code.

import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=1)
pca.fit(X)  

You must increase the n_components to get %90 variance.

Input:

pca.explained_variance_ratio_

Output:

array([0.99244289])

On this example just 1 component is enough.

I hope its all clear to understand.

Resources:
https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60 https://towardsdatascience.com/a-step-by-step-explanation-of-principal-component-analysis-b836fb9c97e2

Upvotes: 4

Nicolas M.
Nicolas M.

Reputation: 1478

You have to look at Eigenvectors of the PCA. Each Eigenvalues are the "force" of each "new axis" and the eigenvector provide the linear combination of your original features.

With scikit-learn, you should look at the attribute components_

import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
pca.fit(X)  
print(pca.components_) # << eigenvector matrix

Upvotes: 1

Related Questions