Reputation: 695
when I define hue
to color my plot, map_lower
calls its function more often and looses data compared to the equivalent call without hue
. Is this a bug or do I make a mistake?
Please see code below
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import seaborn as sns
def corrfunc(x, y, **kws):
r, _ = stats.pearsonr(x, y)
print(x)
print(y)
print(r)
iris = sns.load_dataset("iris")
seax = sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])
seax.map_lower(corrfunc)
plt.show()
If you change
sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])
to
seax = sns.pairplot(iris, hue="sepal_length", size=2, vars=["petal_width", "petal_length", "sepal_width"])
the code is broken but the plot looks good. So if you run the code without hue corrfunc is called 3 times for the 3 plots in lower. If I add hue="class" to color the plot by the field class the corrfunc is called by lower 8 times or so. I dont understand why coloring with hue has an effect on map_lower.
Upvotes: 0
Views: 1185
Reputation: 695
So maybe one day this will help somebody who wants to do what I had in mind. Here is my ugly but working solution:
#!/usr/bin/env python
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
# Global variables to keep track of data chunks if you
# use hue to color the data points. map_lower will with
# hue group data in chunks of identical hue values
dataLength = xName = yName = xData = yData = ''
# Function to group data pairs to plot their correlation
def assemble_data_subplot(x, y, **kwargs):
global xName, yName, xData, yData, dataLength
if xName == '' and yName == '':
xName = x.name
yName = y.name
xData = x
yData = y
elif xName == x.name and yName == y.name:
xData = xData.append(x)
yData = yData.append(y)
if len(xData) == dataLength:
correlate_data(xData, yData)
xName = yName = xData = yData = ''
# Correlation function
def correlate_data(xData, yData):
r, _ = stats.pearsonr(xData, yData)
r = r**2
sax = plt.gca()
sax.annotate("$r^2$={:.2f}".format(r),
xy=(.02, .86),
xycoords=sax.transAxes)
# Main function to plot the pairwise correlation plot
def main():
# Init global variable to set it later
global dataLength
# Path to CSV file and data frame builder
df = sns.load_dataset("iris")
# Example without hue
g = sns.pairplot(df, size=2, hue="petal_width",
vars=["petal_width",
"petal_length",
"sepal_width"])
# Get the number of data entries to check when the assembled data
# is complete. Used in assemble_data_subplot
dataLength = len(df)
# Plot the r^2 value on the lower part of the pair plot
g.map_lower(assemble_data_subplot)
# Generate the output
g.savefig("output.png")
plt.show()
if __name__ == "__main__":
main()
Upvotes: 1
Reputation: 2471
When looking at the code defining map_lower
we see the following piece of code (I left out quite a few bits to be more concise)(left out bits were not relevant to the answer):
def map_lower(self, func, **kwargs):
#irrelevant parts left out
for k, label_k in enumerate(self.hue_names):
#some more irrelevant parts (specifying colours and what not)
func(data_k[x_var], data_k[y_var], label=label_k,
color=color, **kwargs)
return self
So basically for every unique hue
value that is present the func
that is given to map.lower
will be run (for each variable).
When no hue
is given the func
will only be run once on all the relevant data (for each variable). Hence the difference between using hue
and not using it in regards to the amount of calls to func
.
Upvotes: 0