Reputation: 33
I have looked for solutions already but could not find one that worked for my problem. I am trying to plot a histogram with a density function showing the density on the y-axis. meanopa
are average logreturns of the S&P500.
What I do not understand is the following.
norm_hist : bool, optional If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.
Since kde=True in my case, I am wondering why there is the number of observations on the y-axis.
sns.distplot(meanopa, hist=True, kde=True, bins=20, color = 'darkblue',
hist_kws={'edgecolor':'black'}, kde_kws={'linewidth': 4})
Thanks in advance and again I would appreciate any sort of support.
Cheers!
Upvotes: 2
Views: 6896
Reputation: 23
In case you are not interested in the probability density function but in the probabilities/frequencies of each bin which is given by the count of samples in the bin divided by the total number of samples, you can use the 'weights'
attribute of the hist_kws
parameter. Applying this to the example code of lrnzcig,
random.seed(2)
min_rescale = -0.001
max_rescale = 0.001
close2 = [min_rescale + random.random() * (max_rescale - min_rescale) for x in range(100)]
sns.distplot(close2, hist=True, kde=False, bins=5, color = 'darkblue',
hist_kws={'edgecolor':'black', 'weights': np.ones(len(close2))/len(close2)})
results in the following plot: probabilities of Histogram bins using seaborn's distplot
Note that the result is no probability density function, instead the weights of the bins sum up to 1 independent from the argument values of the bins. However, this makes no sense when you are performing kde.
Upvotes: 0
Reputation: 3947
Your result is ok. The y-axis is not showing the values of the histogram, but for the probability density (actually the kernel density estimate). Since your numbers are very small, the x-axis has also a very narrow interval... actually from your plot if you build a square of 0.002 x 500 to approximate the total area under the curve, the result of the full probability density is around 1, as expected.
As a note, this is a reproducible version of your problem, you can play with the rescaling (min_rescale
and max_rescale
values) if you want to see how the shape of the probability density changes.
random.seed(2)
min_rescale = -0.001
max_rescale = 0.001
close2 = [min_rescale + random.random() * (max_rescale - min_rescale) for x in range(100)]
sns.distplot(close2, hist=True, kde=True, bins=5, color = 'darkblue',
hist_kws={'edgecolor':'black'}, kde_kws={'linewidth': 4})
Upvotes: 1