CLat
CLat

Reputation: 33

Seaborn Distplot with Density on y-axis

I have looked for solutions already but could not find one that worked for my problem. I am trying to plot a histogram with a density function showing the density on the y-axis. meanopa are average logreturns of the S&P500. What I do not understand is the following.

norm_hist : bool, optional If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.

Since kde=True in my case, I am wondering why there is the number of observations on the y-axis.

sns.distplot(meanopa, hist=True, kde=True, bins=20, color = 'darkblue',
             hist_kws={'edgecolor':'black'}, kde_kws={'linewidth': 4})

Thanks in advance and again I would appreciate any sort of support.

Cheers!

enter image description here enter image description here

Upvotes: 2

Views: 6896

Answers (2)

Karsten Leonhardt
Karsten Leonhardt

Reputation: 23

In case you are not interested in the probability density function but in the probabilities/frequencies of each bin which is given by the count of samples in the bin divided by the total number of samples, you can use the 'weights' attribute of the hist_kws parameter. Applying this to the example code of lrnzcig,

random.seed(2)
min_rescale = -0.001
max_rescale = 0.001
close2 = [min_rescale + random.random() * (max_rescale - min_rescale) for x in range(100)] 
sns.distplot(close2, hist=True, kde=False, bins=5, color = 'darkblue',
         hist_kws={'edgecolor':'black', 'weights': np.ones(len(close2))/len(close2)})

results in the following plot: probabilities of Histogram bins using seaborn's distplot

Note that the result is no probability density function, instead the weights of the bins sum up to 1 independent from the argument values of the bins. However, this makes no sense when you are performing kde.

Upvotes: 0

lrnzcig
lrnzcig

Reputation: 3947

Your result is ok. The y-axis is not showing the values of the histogram, but for the probability density (actually the kernel density estimate). Since your numbers are very small, the x-axis has also a very narrow interval... actually from your plot if you build a square of 0.002 x 500 to approximate the total area under the curve, the result of the full probability density is around 1, as expected.

As a note, this is a reproducible version of your problem, you can play with the rescaling (min_rescale and max_rescale values) if you want to see how the shape of the probability density changes.

random.seed(2)
min_rescale = -0.001
max_rescale = 0.001
close2 = [min_rescale + random.random() * (max_rescale - min_rescale) for x in range(100)] 
sns.distplot(close2, hist=True, kde=True, bins=5, color = 'darkblue',
             hist_kws={'edgecolor':'black'}, kde_kws={'linewidth': 4})

enter image description here

Upvotes: 1

Related Questions