kvetjo
kvetjo

Reputation: 55

How to assign title and other texts to correct plots in facet grid

I am trying to visualize correlation with Seaborn and its .map feature. However it always return plots with wrong titles and other texts which belong to different plots, not to the one where it is displayed.

For example (the correlation line should be steeper, where the R number is bigger): enter image description here

I generate this plots with this code:

values = [[True, 2, 5], [True, 3, 7], [True, 4, 9], [True, 5, 11], [True, 6, 13], [True, 7, 15],
[False, 2, 3], [False, 3, 3], [False, 4, 15], [False, 5, 4], [False, 6, 1], [False, 7, 5]]

data = pd.DataFrame(values, columns = ["open",'col_A', 'col_B'])

group_by = "open"


grouped = data["col_A"].groupby(data[group_by])
correlation = grouped.corr(data["col_B"], method="pearson")
data = data[data["col_A"].notna()]
data = data[data["col_B"].notna()]
data["col_A"] = pd.to_numeric(data["col_A"])
data["col_B"] = pd.to_numeric(data["col_B"])
data["col_A"] = np.log(1 + data["col_A"])
data["col_B"] = np.log(1 + data["col_B"])

g = sns.FacetGrid(data, col=group_by, col_wrap=4)
g.map(sns.regplot, "col_A", "col_B")
col_order = data[group_by].unique() 
print(type(col_order), col_order)
for txt, title in zip(g.axes.flat, col_order):
    txt.set_title(title)   
    # add text
    txt.text(1.2, 1.2, "R = " + str(correlation[title]), fontsize = 12)
                        
plt.show()

When I use this method, it is working fine:

sns.lmplot(data=data, x="col_A", y="col_B", col=group_by, col_wrap=2)

Probably this line: col_order = data[group_by].unique() return different order than the FacetGrid. How can I make the order correct and same for both.

Upvotes: 1

Views: 569

Answers (1)

Mr. T
Mr. T

Reputation: 12410

The problem here is that you tell seaborn to use the groupby object, but later, you ignore this object and define col_order differently. The solution is to access the groups of the groupby object:

import seaborn as sns
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

values = [[True, 2, 5], [True, 3, 7], [True, 4, 9], [True, 5, 11], [True, 6, 13], [True, 7, 15],
[False, 2, 3], [False, 3, 3], [False, 4, 15], [False, 5, 4], [False, 6, 1], [False, 7, 5]]

data = pd.DataFrame(values, columns = ["open",'col_A', 'col_B'])

group_by = "open"

grouped = data["col_A"].groupby(data[group_by])
correlation = grouped.corr(data["col_B"], method="pearson")
data = data[data["col_A"].notna()]
data = data[data["col_B"].notna()]
data["col_A"] = pd.to_numeric(data["col_A"])
data["col_B"] = pd.to_numeric(data["col_B"])
data["col_A"] = np.log(1 + data["col_A"])
data["col_B"] = np.log(1 + data["col_B"])

g = sns.FacetGrid(data, col=group_by, col_wrap=2)
g.map(sns.regplot, "col_A", "col_B")
col_order = grouped.groups.keys()

for txt, title in zip(g.axes.flat, col_order):
    txt.set_title(title)   
    txt.text(1.2, 1.2, f'R = {correlation[title]:.2}', fontsize = 12)
                        
plt.show()

Sample output: enter image description here

Two independent points:

  1. R is not the slope but the regression coefficient, a measure how good the fit is.
  2. I changed the string representation to f-string format. The advantage is that you can define the number of decimal places the text will show.

Upvotes: 1

Related Questions