Reputation: 85
I am performing PCA on my data set and I could get the right results. But when I am trying to visualize the PCA, it doesn't showing. Here is my try:
#Import dataset
dataset = pd.read_csv('Data.csv', names=['0','1','2','3','target'],header=0)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
#PCA
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X = pca.fit_transform(X)
principalDf = pd.DataFrame(data = X, columns = ['principal component 1',
'principal component 2'])
finalDf = pd.concat([principalDf, dataset['target']], axis = 1)
#Visualizing
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['1', '2', '3', '4']
colors = ['r', 'g', 'b', 'hotpink']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['target'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
, finalDf.loc[indicesToKeep, 'principal component 2']
, c = color
, s = 50)
ax.legend(targets)
ax.grid()
But this is not working and I can not figure it out. How can I fix this?
Upvotes: 0
Views: 1082
Reputation: 4275
You are indexing your finalDf
to get slices of DataFrame where target
column values are the same as a single target
from the list targets = ['1', '2', '3', '4']
.
Since the values in your target
column are not str
type, but numerical type, there is no data that satisfies this condition:
'1' != 1
'2' != 2
'3' != 3
'4' != 4
And thus no data is plotted.
To get the slices you want, instead of targets = ['1', '2', '3', '4']
you should use numerical values as target:
targets = [1, 2, 3, 4]
Upvotes: 1