Reputation: 434
I'm doing a mini project on my own. I'm trying this thing with PCA. After i have plotted my graph, I can't seem to get the color out. These are the steps below for my code. Before this i have scaled and transformed the data. After scaling i did a PCA. These are the steps. First:
from sklearn.decomposition import PCA
pca= PCA(n_components= 2)
pca.fit(scaled_data)
x_pca= pca.transform(scaled_data)
principaldf= pd.DataFrame(data=x_pca,
columns=['principal component 1',
'principal component 2'])
After this i combined both dataframes and got this.
new_df= principaldf.join(df_features)
new_df.head()
Next i attempted to plot the graph with the code below:
color= ['r','g']
plt.scatter(x_pca[:, 0], x_pca[:, 1],
edgecolor='none', alpha=0.5, c= color)
plt.xlabel('component 1')
plt.ylabel('component 2')
I got this error
ValueError: 'c' argument has 2 elements, which is not acceptable for use with 'x' with size 261, 'y' with size 261.
Can anyone please advise or help? Hope my question was clear enough. thanks!
Upvotes: 2
Views: 1748
Reputation: 10150
You can try something like:
Assign numerical values to Gender
:
new_df['Gender'] = new_df['Gender'].replace({'Male':0, 'Female':1})
Then plot using color and cmap:
plt.scatter(x_pca[:, 0], x_pca[:, 1], edgecolor='none', alpha=0.5,
c=new_df['Gender'], cmap='RdYlGn')
When you pass in a 2 item list like ['r', 'g']
it doesnt know which points should be which colour
Upvotes: 2