Reputation: 19
I need to find features with max correlation with 2 principal component. This is training task and the result is wrong (all 4 components have more correlation with 1 component)
from sklearn import datasets
iris = datasets.load_iris()
data = iris.data
target = iris.target
target_names = iris.target_names
means, = np.mean(data, axis=0),
X = (data - means)
from sklearn.decomposition import PCA
model = PCA(n_components=2)
model.fit(X)
proect_data = model.transform(X)
proect_data_abs = np.absolute(proect_data)
means, = np.mean(proect_data_abs, axis=0),
Y = (proect_data_abs - means)
corr_array = np.corrcoef(X.T, Y.T)
Upvotes: 1
Views: 103
Reputation: 613
You do no provide any justification for why you take the absolute value of your transformed data, and it is very unclear why you do it.
If that part is removed, which makes subtracting the mean again unnecessary, you get expected results, and it's easy to read off what features have the highest correlation with the principal components:
Y = proect_data
corr_array = np.corrcoef(X.T, Y.T)
corr_array[4:,:4]
array([[ 0.89754488, -0.38999338, 0.99785405, 0.96648418],
[ 0.39023141, 0.82831259, -0.04903006, -0.04818017]])
Upvotes: 2