Seaborn pairplot: Lost data in map_lower when using hue

Question

when I define hue to color my plot, map_lower calls its function more often and looses data compared to the equivalent call without hue. Is this a bug or do I make a mistake?

Please see code below

import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import seaborn as sns


def corrfunc(x, y, **kws):
    r, _ = stats.pearsonr(x, y)
    print(x)
    print(y)
    print(r)

iris = sns.load_dataset("iris")
seax = sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])
seax.map_lower(corrfunc)
plt.show()

If you change

sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])

to

seax = sns.pairplot(iris, hue="sepal_length", size=2, vars=["petal_width", "petal_length", "sepal_width"])

the code is broken but the plot looks good. So if you run the code without hue corrfunc is called 3 times for the 3 plots in lower. If I add hue="class" to color the plot by the field class the corrfunc is called by lower 8 times or so. I dont understand why coloring with hue has an effect on map_lower.

doom4 · Accepted Answer

So maybe one day this will help somebody who wants to do what I had in mind. Here is my ugly but working solution:

#!/usr/bin/env python
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

# Global variables to keep track of data chunks if you
# use hue to color the data points. map_lower will with
# hue group data in chunks of identical hue values

dataLength = xName = yName = xData = yData = ''


# Function to group data pairs to plot their correlation
def assemble_data_subplot(x, y, **kwargs):
    global xName, yName, xData, yData, dataLength
    if xName == '' and yName == '':
        xName = x.name
        yName = y.name
        xData = x
        yData = y
    elif xName == x.name and yName == y.name:
        xData = xData.append(x)
        yData = yData.append(y)

    if len(xData) == dataLength:
        correlate_data(xData, yData)
        xName = yName = xData = yData = ''


# Correlation function
def correlate_data(xData, yData):
    r, _ = stats.pearsonr(xData, yData)
    r = r**2
    sax = plt.gca()
    sax.annotate("$r^2$={:.2f}".format(r),
                 xy=(.02, .86),
                 xycoords=sax.transAxes)


# Main function to plot the pairwise correlation plot
def main():
    # Init global variable to set it later
    global dataLength

    # Path to CSV file and data frame builder
    df = sns.load_dataset("iris")

    # Example without hue
    g = sns.pairplot(df, size=2, hue="petal_width",
                     vars=["petal_width",
                           "petal_length",
                           "sepal_width"])

    # Get the number of data entries to check when the assembled data
    # is complete. Used in assemble_data_subplot
    dataLength = len(df)

    # Plot the r^2 value on the lower part of the pair plot
    g.map_lower(assemble_data_subplot)

    # Generate the output
    g.savefig("output.png")
    plt.show()


if __name__ == "__main__":
    main()

Seaborn pairplot: Lost data in map_lower when using hue

Answers (2)

Related Questions