Div Trivedi
Div Trivedi

Reputation: 91

Labeling matplotlib.pyplot.scatter with pandas dataframe

I have a pandas dataframe which I want to apply as labels to each point on a scatter plot. With respect to data, it is clustering data and the dataframe contains labels to each point and in which cluster it belongs. Would be helpful to project that on scatter plot above. I tried using annotate and came up with error. Below is my code for scatter plot:

 import hdbscan
 import numpy as np
 import seaborn as sns
 import matplotlib.pyplot as plt
 import pandas as pd
 import umap 
 from sklearn.decomposition import PCA
 import sklearn.cluster as cluster
 from sklearn.metrics import adjusted_rand_score, 
 adjusted_mutual_info_score

 se1= umap.UMAP(n_neighbors = 20,random_state=42).fit_transform(data_1)

 cluster_1 = hdbscan.HDBSCAN(min_cluster_size = 15, min_samples =3).fit_predict(se1)
 clustered = (cluster_1 >=0)
 plt.scatter(se1[~clustered,0],se1[~clustered,1],c=(0.5,0.5,0.5), s=5, alpha =0.5)
 plt.scatter(se1[clustered,0], se1[clustered,1], c=cluster_1[clustered],s=5, cmap='prism');
 plt.show()

enter image description here

How can I add df1 (960 rows x 1 column) as label to all points in above scatter plot?

  df1 = pd.DataFrame(cluster_1)
  plt.annotate(cluster_3,se3[clustered,0], se3[clustered,1])

*Error: "Traceback (most recent call last): File "", line 1, in File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 2388, in annotate return gca().annotate(s, xy, *args, **kwargs) File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes_axes.py", line 791, in annotate a = mtext.Annotation(s, xy, *args, **kwargs) File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py", line 307, in wrapper return func(*args, **kwargs) File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\text.py", line 2166, in init x, y = xytext ValueError: too many values to unpack (expected 2)"*

Upvotes: 2

Views: 8082

Answers (1)

JJJJ
JJJJ

Reputation: 308

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
%matplotlib inline
df = pd.DataFrame({'x':np.random.rand(10),'y':np.random.rand(10),'label':list(string.ascii_lowercase[:10])})

a df looks like this

x   y   label
0.854133    0.020296    a
0.320214    0.857453    b
0.470433    0.103763    c
0.698247    0.869477    d
0.366012    0.127051    e
0.769241    0.767591    f
0.219338    0.351735    g
0.882301    0.311616    h
0.083092    0.159695    i
0.403883    0.460098    j

Try:

ax = df.plot(x='x',y='y',kind='scatter',figsize=(10,10))
df[['x','y','label']].apply(lambda x: ax.text(*x),axis=1)

gets you this:

enter image description here

Or if you want to use legend:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
import string
%matplotlib inline
df = pd.DataFrame({'x':np.random.rand(50), 'y':np.random.rand(50),'label': [int(x) for x in '12345'*10]})

fig, ax = plt.subplots(figsize=(5,5))
ax = sns.scatterplot(x='x',y='y',hue = 'label',data = df,legend='full',
                     palette = {1:'red',2:'orange',3:'yellow',4:'green',5:'blue'})
ax.legend(loc='lower left')

enter image description here

Upvotes: 6

Related Questions