Reputation: 55
I am trying to visualize correlation with Seaborn and its .map feature. However it always return plots with wrong titles and other texts which belong to different plots, not to the one where it is displayed.
For example (the correlation line should be steeper, where the R number is bigger):
I generate this plots with this code:
values = [[True, 2, 5], [True, 3, 7], [True, 4, 9], [True, 5, 11], [True, 6, 13], [True, 7, 15],
[False, 2, 3], [False, 3, 3], [False, 4, 15], [False, 5, 4], [False, 6, 1], [False, 7, 5]]
data = pd.DataFrame(values, columns = ["open",'col_A', 'col_B'])
group_by = "open"
grouped = data["col_A"].groupby(data[group_by])
correlation = grouped.corr(data["col_B"], method="pearson")
data = data[data["col_A"].notna()]
data = data[data["col_B"].notna()]
data["col_A"] = pd.to_numeric(data["col_A"])
data["col_B"] = pd.to_numeric(data["col_B"])
data["col_A"] = np.log(1 + data["col_A"])
data["col_B"] = np.log(1 + data["col_B"])
g = sns.FacetGrid(data, col=group_by, col_wrap=4)
g.map(sns.regplot, "col_A", "col_B")
col_order = data[group_by].unique()
print(type(col_order), col_order)
for txt, title in zip(g.axes.flat, col_order):
txt.set_title(title)
# add text
txt.text(1.2, 1.2, "R = " + str(correlation[title]), fontsize = 12)
plt.show()
When I use this method, it is working fine:
sns.lmplot(data=data, x="col_A", y="col_B", col=group_by, col_wrap=2)
Probably this line: col_order = data[group_by].unique() return different order than the FacetGrid. How can I make the order correct and same for both.
Upvotes: 1
Views: 569
Reputation: 12410
The problem here is that you tell seaborn to use the groupby object, but later, you ignore this object and define col_order
differently. The solution is to access the groups of the groupby object:
import seaborn as sns
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
values = [[True, 2, 5], [True, 3, 7], [True, 4, 9], [True, 5, 11], [True, 6, 13], [True, 7, 15],
[False, 2, 3], [False, 3, 3], [False, 4, 15], [False, 5, 4], [False, 6, 1], [False, 7, 5]]
data = pd.DataFrame(values, columns = ["open",'col_A', 'col_B'])
group_by = "open"
grouped = data["col_A"].groupby(data[group_by])
correlation = grouped.corr(data["col_B"], method="pearson")
data = data[data["col_A"].notna()]
data = data[data["col_B"].notna()]
data["col_A"] = pd.to_numeric(data["col_A"])
data["col_B"] = pd.to_numeric(data["col_B"])
data["col_A"] = np.log(1 + data["col_A"])
data["col_B"] = np.log(1 + data["col_B"])
g = sns.FacetGrid(data, col=group_by, col_wrap=2)
g.map(sns.regplot, "col_A", "col_B")
col_order = grouped.groups.keys()
for txt, title in zip(g.axes.flat, col_order):
txt.set_title(title)
txt.text(1.2, 1.2, f'R = {correlation[title]:.2}', fontsize = 12)
plt.show()
Two independent points:
Upvotes: 1