Apollo
Apollo

Reputation: 9064

Histogram with equal number of points in each bin

I have a sorted vector points with 100 points. I now want to create two histograms: the first histogram should have 10 bins having equal width. The second should also have 10 histograms, but not necessarily of equal width. In the second, I just want the histogram to have the same number of points in each bin. So for example, the first bar might be very short and wide, while the second bar in the histogram might be very tall and narrow. I have code that creates the first histogram using matplotlib, but now I'm not sure how to go about creating the second one.

import matplotlib.pyplot as plt
points = [1,2,3,4,5,6, ..., 99]
n, bins, patches = plt.hist(points, 10)

Edit:

Trying the solution below, I'm a bit puzzled as to why the heights of all of the bars in my histogram are the same.

enter image description here

Upvotes: 14

Views: 15306

Answers (4)

Kiann
Kiann

Reputation: 571

this solution is not as elegant, but it works for me. Hope it helps

def pyAC(x, npoints = 10, RetType='abs'):
    x = np.sort(x)
    ksort = np.argsort(x)
    binCount = int(len(x)/npoints) #number of data points in each bin
    bins = np.zeros(npoints) #initialize the bins values
    binsX = np.zeros(npoints)
    for i in range(0, npoints, 1):
        bins[i] = x[(i+1) * binCount]
        for j in range(((binCount * i) + 1), (binCount * (i+1)), 1):
            binsX[i] = x[j] + binsX[i]
    binsX = binsX/binCount  
    return pd.DataFrame({'bins':bins, 'binsX':binsX})

Upvotes: 0

Alejandro
Alejandro

Reputation: 3412

Here I wrote an example on how you could get the result. My approach uses the data points to get the bins that will be passed to np.histogram to construct the histogram. Hence the need to sort the data using np.argsort(x). The number of points per bin can be controlled with npoints. As an example, I construct two histograms using this method. One where the weights of all points is the same, so that the height of the histogram is always constant (and equal to npoints). The other where the "weight" of each point is drawn from a uniform random distribution (see mass array). As expected, the boxes of the histogram are not equal anymore. However, the Poisson error per bin is the same.

x = np.random.rand(1000)
mass = np.random.rand(1000)
npoints = 200
ksort = np.argsort(x)

#Here I get the bins from the data set.
#Note that data need to be sorted
bins=x[ksort[0::npoints]]
bins=np.append(bins,x[ksort[-1]])


fig = plt.figure(1,figsize=(10,5))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

#Histogram where each data 
yhist, xhist = np.histogram(x, bins, weights=None)
ax1.plot(0.5*(xhist[1:]+xhist[:-1]), yhist, linestyle='steps-mid', lw=2, color='k')

yhist, xhist = np.histogram(x, bins, weights=mass)
ax2.plot(0.5*(xhist[1:]+xhist[:-1]), yhist, linestyle='steps-mid', lw=2, color='k')

ax1.set_xlabel('x', size=15)
ax1.set_ylabel('Number of points per bin', size=15)

ax2.set_xlabel('x', size=15)
ax2.set_ylabel('Mass per bin', size=15)

enter image description here

Upvotes: 0

farenorth
farenorth

Reputation: 10791

This question is similar to one that I wrote an answer to a while back, but sufficiently different to warrant it's own question. The solution, it turns out, uses basically the same code from my other answer.

def histedges_equalN(x, nbin):
    npt = len(x)
    return np.interp(np.linspace(0, npt, nbin + 1),
                     np.arange(npt),
                     np.sort(x))

x = np.random.randn(100)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10))

This solution gives a histogram with equal height bins, because---by definition---a histogram is a count of the number of points in each bin.

To get a pdf (i.e. density function) use the normed=True kwarg to plt.hist. As described in my other answer.

Upvotes: 23

jlarsch
jlarsch

Reputation: 2317

provide bins to histogram:

bins=points[0::len(points)/10]

and then

n, bins, patches = plt.hist(points, bins=bins)

(provided points is sorted)

Upvotes: 1

Related Questions