Labeling matplotlib.pyplot.scatter with pandas dataframe

Question

I have a pandas dataframe which I want to apply as labels to each point on a scatter plot. With respect to data, it is clustering data and the dataframe contains labels to each point and in which cluster it belongs. Would be helpful to project that on scatter plot above. I tried using annotate and came up with error. Below is my code for scatter plot:

 import hdbscan
 import numpy as np
 import seaborn as sns
 import matplotlib.pyplot as plt
 import pandas as pd
 import umap 
 from sklearn.decomposition import PCA
 import sklearn.cluster as cluster
 from sklearn.metrics import adjusted_rand_score, 
 adjusted_mutual_info_score

 se1= umap.UMAP(n_neighbors = 20,random_state=42).fit_transform(data_1)

 cluster_1 = hdbscan.HDBSCAN(min_cluster_size = 15, min_samples =3).fit_predict(se1)
 clustered = (cluster_1 >=0)
 plt.scatter(se1[~clustered,0],se1[~clustered,1],c=(0.5,0.5,0.5), s=5, alpha =0.5)
 plt.scatter(se1[clustered,0], se1[clustered,1], c=cluster_1[clustered],s=5, cmap='prism');
 plt.show()

How can I add df1 (960 rows x 1 column) as label to all points in above scatter plot?

  df1 = pd.DataFrame(cluster_1)
  plt.annotate(cluster_3,se3[clustered,0], se3[clustered,1])

*Error: "Traceback (most recent call last): File "", line 1, in File "C:\Users rivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 2388, in annotate return gca().annotate(s, xy, *args, **kwargs) File "C:\Users rivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes_axes.py", line 791, in annotate a = mtext.Annotation(s, xy, *args, **kwargs) File "C:\Users rivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py", line 307, in wrapper return func(*args, **kwargs) File "C:\Users rivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib ext.py", line 2166, in init x, y = xytext ValueError: too many values to unpack (expected 2)"*

JJJJ · Accepted Answer

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
%matplotlib inline
df = pd.DataFrame({'x':np.random.rand(10),'y':np.random.rand(10),'label':list(string.ascii_lowercase[:10])})

a df looks like this

x   y   label
0.854133    0.020296    a
0.320214    0.857453    b
0.470433    0.103763    c
0.698247    0.869477    d
0.366012    0.127051    e
0.769241    0.767591    f
0.219338    0.351735    g
0.882301    0.311616    h
0.083092    0.159695    i
0.403883    0.460098    j

Try:

ax = df.plot(x='x',y='y',kind='scatter',figsize=(10,10))
df[['x','y','label']].apply(lambda x: ax.text(*x),axis=1)

gets you this:

Or if you want to use legend:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
import string
%matplotlib inline
df = pd.DataFrame({'x':np.random.rand(50), 'y':np.random.rand(50),'label': [int(x) for x in '12345'*10]})

fig, ax = plt.subplots(figsize=(5,5))
ax = sns.scatterplot(x='x',y='y',hue = 'label',data = df,legend='full',
                     palette = {1:'red',2:'orange',3:'yellow',4:'green',5:'blue'})
ax.legend(loc='lower left')

Labeling matplotlib.pyplot.scatter with pandas dataframe

Answers (1)

Related Questions