Reputation: 9064
I have a sorted vector points
with 100 points. I now want to create two histograms: the first histogram should have 10 bins having equal width. The second should also have 10 histograms, but not necessarily of equal width. In the second, I just want the histogram to have the same number of points in each bin. So for example, the first bar might be very short and wide, while the second bar in the histogram might be very tall and narrow. I have code that creates the first histogram using matplotlib
, but now I'm not sure how to go about creating the second one.
import matplotlib.pyplot as plt
points = [1,2,3,4,5,6, ..., 99]
n, bins, patches = plt.hist(points, 10)
Edit:
Trying the solution below, I'm a bit puzzled as to why the heights of all of the bars in my histogram are the same.
Upvotes: 14
Views: 15306
Reputation: 571
this solution is not as elegant, but it works for me. Hope it helps
def pyAC(x, npoints = 10, RetType='abs'):
x = np.sort(x)
ksort = np.argsort(x)
binCount = int(len(x)/npoints) #number of data points in each bin
bins = np.zeros(npoints) #initialize the bins values
binsX = np.zeros(npoints)
for i in range(0, npoints, 1):
bins[i] = x[(i+1) * binCount]
for j in range(((binCount * i) + 1), (binCount * (i+1)), 1):
binsX[i] = x[j] + binsX[i]
binsX = binsX/binCount
return pd.DataFrame({'bins':bins, 'binsX':binsX})
Upvotes: 0
Reputation: 3412
Here I wrote an example on how you could get the result. My approach uses the data points to get the bins that will be passed to np.histogram
to construct the histogram. Hence the need to sort the data using np.argsort(x)
. The number of points per bin can be controlled with npoints
. As an example, I construct two histograms using this method. One where the weights of all points is the same, so that the height of the histogram is always constant (and equal to npoints
). The other where the "weight" of each point is drawn from a uniform random distribution (see mass
array). As expected, the boxes of the histogram are not equal anymore. However, the Poisson error per bin is the same.
x = np.random.rand(1000)
mass = np.random.rand(1000)
npoints = 200
ksort = np.argsort(x)
#Here I get the bins from the data set.
#Note that data need to be sorted
bins=x[ksort[0::npoints]]
bins=np.append(bins,x[ksort[-1]])
fig = plt.figure(1,figsize=(10,5))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
#Histogram where each data
yhist, xhist = np.histogram(x, bins, weights=None)
ax1.plot(0.5*(xhist[1:]+xhist[:-1]), yhist, linestyle='steps-mid', lw=2, color='k')
yhist, xhist = np.histogram(x, bins, weights=mass)
ax2.plot(0.5*(xhist[1:]+xhist[:-1]), yhist, linestyle='steps-mid', lw=2, color='k')
ax1.set_xlabel('x', size=15)
ax1.set_ylabel('Number of points per bin', size=15)
ax2.set_xlabel('x', size=15)
ax2.set_ylabel('Mass per bin', size=15)
Upvotes: 0
Reputation: 10791
This question is similar to one that I wrote an answer to a while back, but sufficiently different to warrant it's own question. The solution, it turns out, uses basically the same code from my other answer.
def histedges_equalN(x, nbin):
npt = len(x)
return np.interp(np.linspace(0, npt, nbin + 1),
np.arange(npt),
np.sort(x))
x = np.random.randn(100)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10))
This solution gives a histogram with equal height bins, because---by definition---a histogram is a count of the number of points in each bin.
To get a pdf (i.e. density function) use the normed=True
kwarg to plt.hist. As described in my other answer.
Upvotes: 23
Reputation: 2317
provide bins to histogram:
bins=points[0::len(points)/10]
and then
n, bins, patches = plt.hist(points, bins=bins)
(provided points is sorted)
Upvotes: 1