Reputation: 91
I have a pandas dataframe which I want to apply as labels to each point on a scatter plot. With respect to data, it is clustering data and the dataframe contains labels to each point and in which cluster it belongs. Would be helpful to project that on scatter plot above. I tried using annotate and came up with error. Below is my code for scatter plot:
import hdbscan
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import umap
from sklearn.decomposition import PCA
import sklearn.cluster as cluster
from sklearn.metrics import adjusted_rand_score,
adjusted_mutual_info_score
se1= umap.UMAP(n_neighbors = 20,random_state=42).fit_transform(data_1)
cluster_1 = hdbscan.HDBSCAN(min_cluster_size = 15, min_samples =3).fit_predict(se1)
clustered = (cluster_1 >=0)
plt.scatter(se1[~clustered,0],se1[~clustered,1],c=(0.5,0.5,0.5), s=5, alpha =0.5)
plt.scatter(se1[clustered,0], se1[clustered,1], c=cluster_1[clustered],s=5, cmap='prism');
plt.show()
How can I add df1 (960 rows x 1 column) as label to all points in above scatter plot?
df1 = pd.DataFrame(cluster_1)
plt.annotate(cluster_3,se3[clustered,0], se3[clustered,1])
*Error: "Traceback (most recent call last): File "", line 1, in File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 2388, in annotate return gca().annotate(s, xy, *args, **kwargs) File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes_axes.py", line 791, in annotate a = mtext.Annotation(s, xy, *args, **kwargs) File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py", line 307, in wrapper return func(*args, **kwargs) File "C:\Users\trivedd\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\text.py", line 2166, in init x, y = xytext ValueError: too many values to unpack (expected 2)"*
Upvotes: 2
Views: 8082
Reputation: 308
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
%matplotlib inline
df = pd.DataFrame({'x':np.random.rand(10),'y':np.random.rand(10),'label':list(string.ascii_lowercase[:10])})
a df looks like this
x y label
0.854133 0.020296 a
0.320214 0.857453 b
0.470433 0.103763 c
0.698247 0.869477 d
0.366012 0.127051 e
0.769241 0.767591 f
0.219338 0.351735 g
0.882301 0.311616 h
0.083092 0.159695 i
0.403883 0.460098 j
Try:
ax = df.plot(x='x',y='y',kind='scatter',figsize=(10,10))
df[['x','y','label']].apply(lambda x: ax.text(*x),axis=1)
gets you this:
Or if you want to use legend:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
import string
%matplotlib inline
df = pd.DataFrame({'x':np.random.rand(50), 'y':np.random.rand(50),'label': [int(x) for x in '12345'*10]})
fig, ax = plt.subplots(figsize=(5,5))
ax = sns.scatterplot(x='x',y='y',hue = 'label',data = df,legend='full',
palette = {1:'red',2:'orange',3:'yellow',4:'green',5:'blue'})
ax.legend(loc='lower left')
Upvotes: 6