Reputation: 369
plt.hist
's density
argument does not work.
I tried to use the density
argument in the plt.hist
function to normalize stock returns in my plot, but it didn't work.
The following code worked fine for me and give me the probability density function which I desired.
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
num_bins = 50
plt.hist(x, num_bins, density=1)
plt.show()
But when I tried it with stock data, it simply didn't work. The result gave the unnormalized data. I didn't find any abnormal data in my data array.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
plt.hist(returns, 50,density = True)
plt.show()
# "returns" is a np array consisting of 360 days of stock returns
Upvotes: 20
Views: 17418
Reputation: 113
At first I also thought that this is an issue. I thought that the tick values shown in the y-axis should not be greater than 1
. This means the frequency in that bin is greater than the total frequency which simply doesn't make any sense.
After thinking for a while, I understood what's really happening. So what we are expecting it to return is the Probability Distribution Function which is nothing but the (Observed frequency of a bin) / (Total frequency).
But what Matplotlib returns as density is (Observed frequency of a bin) / (Total frequency * length of each bin). If length of each bin is quite less than 1, then density for that particular bin can go beyond 1. But the total area under the histogram remains 1. As, sum(density*bin_length) for all bins = sum(each frequency)/(Total Frequency) = 1.
So the values you are getting are absolutely fine and make sense too.
Upvotes: 1
Reputation: 366
Another approach, besides that of tvbc, is to change the yticks on the plot.
import matplotlib.pyplot as plt
import numpy as np
steps = 10
bins = np.arange(0, 101, steps)
data = np.random.random(100000) * 100
plt.hist(data, bins=bins, density=True)
yticks = plt.gca().get_yticks()
plt.yticks(yticks, np.round(yticks * steps, 2))
plt.show()
Upvotes: 0
Reputation: 43
Since this isn't resolved; based on @user14518925's response which is actually correct, this is treating bin width as an actual valid number whereas from my understanding you want each bin to have a width of 1 such that the sum of frequencies is 1. More succinctly, what you're seeing is:
\sum_{i}y_{i}\times\text{bin size} =1
Whereas what you want is:
\sum_{i}y_{i} =1
therefore, all you really need to change is the tick labels on the y-axis. One way to this is to disable the density option :
density = false
and instead divide by the total sample size as such (shown in your example):
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 0 # mean of distribution
sigma = 0.0000625 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
fig = plt.figure()
plt.hist(x, 50, density=False)
locs, _ = plt.yticks()
print(locs)
plt.yticks(locs,np.round(locs/len(x),3))
plt.show()
Upvotes: 1
Reputation: 39
It is not a bug. Area of the bars equal to 1. Numbers only seem strange because your bin sizes are small
Upvotes: 3
Reputation: 16928
This is a known issue in Matplotlib.
As stated in Bug Report: The density flag in pyplot.hist() does not work correctly
When density = False, the histogram plot would have counts on the Y-axis. But when density = True, the Y-axis does not mean anything useful. I think a better implementation would plot the PDF as the histogram when density = True.
The developers view this as a feature not a bug since it maintains compatibility with numpy. They have closed several the bug reports about it already with since it is working as intended. Creating even more confusion the example on the matplotlib site appears to show this feature working with the y-axis being assigned a meaningful value.
What you want to do with matplotlib is reasonable but matplotlib will not let you do it that way.
Upvotes: 10