Jonathan
Jonathan

Reputation: 434

PCA can't get color on scatterplot

I'm doing a mini project on my own. I'm trying this thing with PCA. After i have plotted my graph, I can't seem to get the color out. These are the steps below for my code. Before this i have scaled and transformed the data. After scaling i did a PCA. These are the steps. First:

from sklearn.decomposition import PCA
pca= PCA(n_components= 2)
pca.fit(scaled_data)

x_pca= pca.transform(scaled_data)

principaldf= pd.DataFrame(data=x_pca, 
                      columns=['principal component 1', 
                              'principal component 2'])

After this i combined both dataframes and got this.

new_df= principaldf.join(df_features)
new_df.head()

enter image description here

Next i attempted to plot the graph with the code below:

color= ['r','g']
plt.scatter(x_pca[:, 0], x_pca[:, 1],
     edgecolor='none', alpha=0.5, c= color)
plt.xlabel('component 1')
plt.ylabel('component 2')

I got this error

ValueError: 'c' argument has 2 elements, which is not acceptable for use with 'x' with size 261, 'y' with size 261.

enter image description here

Can anyone please advise or help? Hope my question was clear enough. thanks!

Upvotes: 2

Views: 1748

Answers (1)

Simon
Simon

Reputation: 10150

You can try something like:

Assign numerical values to Gender:

new_df['Gender'] = new_df['Gender'].replace({'Male':0, 'Female':1})

Then plot using color and cmap:

plt.scatter(x_pca[:, 0], x_pca[:, 1], edgecolor='none', alpha=0.5,
            c=new_df['Gender'], cmap='RdYlGn')

When you pass in a 2 item list like ['r', 'g'] it doesnt know which points should be which colour

Upvotes: 2

Related Questions