Rho Phi
Rho Phi

Reputation: 1240

histogram misses values in matplotlib, bug?

bins=np.arange(0,1,0.1)
discrete_pdf=np.power(bins,1.5)
discrete_pdf=discrete_pdf/np.sum(discrete_pdf)
print(bins)
print(discrete_pdf)
print(np.sum(discrete_pdf))
plt.plot(bins,discrete_pdf)
plt.show()
values=np.random.choice(bins, 100000, p=discrete_pdf)
plt.hist(values,10)
plt.show()

it's me not able to use hist or is a "feature"/bug?

if you force the hist function to male 10*n bins (e.g. 20 or 100) the plot looks reasonable, but it has empty spaces due to finer binning.

enter image description here

Upvotes: 0

Views: 629

Answers (1)

DavidG
DavidG

Reputation: 25362

This is because you are letting matplotlib automatically determine the bins for you by using plt.hist(values,10) because the second argument is the number of bins. If we look at the automatic bins generated by matplotlib with a value of 10 for the number of bins they are:

[0.1  0.18 0.26 0.34 0.42 0.5  0.58 0.66 0.74 0.82 0.9 ]

You can pass in custom bins rather than letting matplotlib automatically decide them. Therefore the solution is to pass in a list (or array) of bins plotting_bins = np.arange(0,1.1,1) noticing than an extra bin has been added on the end.

import numpy as np
import matplotlib.pyplot as plt

bins=np.arange(0,1,0.1)

discrete_pdf=np.power(bins,1.5)
discrete_pdf=discrete_pdf/np.sum(discrete_pdf)
values=np.random.choice(bins, 100000, p=discrete_pdf)

plotting_bins=np.arange(0,1.1,0.1) # need to add an extra bin when plotting 

fig, (ax1,ax2) = plt.subplots(1,2,figsize=(6,4))

ax1.hist(values, 10)
ax1.set_title("Automatic bins")
ax2.hist(values, bins=plotting_bins)
ax2.set_title("Manual bins")

ax1.set_xlim(0,1)
ax2.set_xlim(0,1)

plt.tight_layout()
plt.show()

enter image description here

If you want to know how the automatic bins are created when you provide an integer for the bins argument we can look at the documentation of numpy.histogram() (which is what plt.hist() uses beind the scenes):

bins : int or sequence of scalars or str, optional

If bins is an int, it defines the number of equal-width bins in the given range

and

range : (float, float), optional

The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max()).

Upvotes: 1

Related Questions