Reputation: 33197
I want to plot the correlation plot of 2 variables using seaborn jointplot
. I have tried a lot of different things but I am not able to add colors to the points according to class.
Here is my code:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
X = np.array([5.2945 , 3.6013 , 3.9675 , 5.1602 , 4.1903 , 4.4995 , 4.5234 ,
4.6618 , 0.76131, 0.42036, 0.71092, 0.60899, 0.66451, 0.55388,
0.63863, 0.62504, 0. , 0. , 0.49364, 0.44828, 0.43066,
0.57368, 0. , 0. , 0.64824, 0.65166, 0.64968, 0. ,
0. , 0.52522, 0.58259, 1.1309 , 0. , 0. , 1.0514 ,
0.7519 , 0.78745, 0.94873, 1.0169 , 0. , 0. , 1.0416 ,
0. , 0. , 0.93648, 0.92801, 0. , 0. , 0.89594,
0. , 0.80455, 1.0103 ])
y = np.array([ 93, 115, 107, 115, 110, 107, 102, 113, 95, 101, 116, 74, 102,
102, 78, 85, 108, 110, 109, 80, 91, 88, 99, 110, 108, 96,
105, 93, 107, 98, 88, 75, 106, 92, 82, 84, 84, 92, 115,
107, 97, 115, 85, 133, 100, 65, 96, 105, 112, 107, 107, 105])
ax = sns.jointplot(X, y, kind='reg' )
ax.set_axis_labels(xlabel='Brain scores', ylabel='Cognitive scores')
plt.tight_layout()
plt.show()
Now, I want to add colors for each point according to a class variable classes
.
Upvotes: 8
Views: 13613
Reputation: 180
label Method 2 Method 1
0 Label 1 1.484914 -1.069439
1 Label 1 0.273158 1.139414
2 Label 1 1.089244 0.161752
3 Label 1 1.184306 -0.981758
4 Label 1 1.424435 0.300742
.. ... ... ...
111 Label 2 -0.201226 0.852319
112 Label 2 0.016911 0.985805
113 Label 2 -0.263775 0.248942
114 Label 2 3.283341 -1.247014
115 Label 2 0.325648 1.793694
[116 rows x 3 columns]
sns.jointplot(data=data, x="Method 1, y="Method 2", "hue="label", palette={
'Label 1': '#d7191c',
'Label 2': '#2b83ba'
})
Use joint_kws={"alpha": 0.5}
to set transparency.
Example plot:
Upvotes: 0
Reputation: 2896
To build off Ernest's answer:
After you set scatter = False
in sns.jointplot
build the scatterplot using sns.scatterplot
with the hue = classes
argument equal to the categorical variable array. I find it cleanest to merge your data into a pandas dataframe with the columns x
, y
and classes
and use this as the data
for the scatterplot, but you don't have to do it this way...
classes = np.array([1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3.])
# make them look a little more 'categorical'
classes = classes.astype('int')
x = np.array([5.2945 , 3.6013 , 3.9675 , 5.1602 , 4.1903 , 4.4995 , 4.5234 ,
4.6618 , 0.76131, 0.42036, 0.71092, 0.60899, 0.66451, 0.55388,
0.63863, 0.62504, 0. , 0. , 0.49364, 0.44828, 0.43066,
0.57368, 0. , 0. , 0.64824, 0.65166, 0.64968, 0. ,
0. , 0.52522, 0.58259, 1.1309 , 0. , 0. , 1.0514 ,
0.7519 , 0.78745, 0.94873, 1.0169 , 0. , 0. , 1.0416 ,
0. , 0. , 0.93648, 0.92801, 0. , 0. , 0.89594,
0. , 0.80455, 1.0103 ])
y = np.array([ 93, 115, 107, 115, 110, 107, 102, 113, 95, 101, 116, 74, 102,
102, 78, 85, 108, 110, 109, 80, 91, 88, 99, 110, 108, 96,
105, 93, 107, 98, 88, 75, 106, 92, 82, 84, 84, 92, 115,
107, 97, 115, 85, 133, 100, 65, 96, 105, 112, 107, 107, 105])
sns.jointplot(x, y, kind='reg', scatter = False )
sns.scatterplot(x, y, hue=classes)
Upvotes: 0
Reputation: 33197
I managed to find a solution that is exactly what I need. Thank to @ImportanceOfBeingErnest that gave me the idea to let the regplot
only draw the regression line.
Solution:
import pandas as pd
classes = np.array([1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
2., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
3., 3., 3., 3., 3., 3., 3.])
df = pd.DataFrame(map(list, zip(*[X.T, y.ravel().T])))
df = df.reset_index()
df['index'] = classes[:]
g = sns.jointplot(X, y, kind='reg', scatter = False )
for i, subdata in df.groupby("index"):
sns.kdeplot(subdata.iloc[:,1], ax=g.ax_marg_x, legend=False)
sns.kdeplot(subdata.iloc[:,2], ax=g.ax_marg_y, vertical=True, legend=False)
g.ax_joint.plot(subdata.iloc[:,1], subdata.iloc[:,2], "o", ms = 8)
plt.tight_layout()
plt.show()
Upvotes: 3
Reputation: 339765
The obvious solution is to let the regplot
only draw the regression line, but not the points and add those via a usual scatter plot, which has the color c
argument.
g = sns.jointplot(X, y, kind='reg', scatter = False )
g.ax_joint.scatter(X,y, c=classes)
Upvotes: 9