Reputation: 31
I have a question regarding the y-axis of the histograms, which are generated in an default pairplot with seaborn.
Here is some example code:
import pandas as pd
import seaborn as sns
import numpy as np
data = [np.random.random_sample(20), np.random.random_sample(20)]
dataFrame = pd.DataFrame(data=zip(*data))
g = sns.pairplot(dataFrame)
g.savefig("test.png", dpi=100)
What is the unit of the y-axis in the diagonal placed histograms? How can I read the height of a bin in this view?
Thank you very much,
Chris
Upvotes: 3
Views: 2252
Reputation: 40737
by default, pairplot
uses the diagonal to "show the univariate distribution of the data for the variable in that column" (http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.pairplot.html).
So each bar represent the count of values in the corresponding bin (that you can get from the X axis). The Y axis, however, does not correspond to the actual count, but corresponds to the scatterplot instead.
I could not get the data from the PairPlot
itself, but if you don't say otherwise, seaborn uses plt.hist()
to generate that diagonal, so you could get the data using:
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import seaborn as sns
import numpy as np
data = [np.random.random_sample(20), np.random.random_sample(20)]
dataFrame = pd.DataFrame(data=zip(*data))
g = sns.pairplot(dataFrame)
# for the first variable:
c, b, p = plt.hist(dataFrame.iloc[:,0])
print c
# [ 3. 6. 0. 2. 3. 0. 1. 3. 1. 1.]
Upvotes: 6