valerio
valerio

Reputation: 717

Merging bins in numpy array

I have an histogram saved in an array, with the rightmost edges of the bins in the first column and the corresponding frequency in the second one. For example:

array([[1.00000000e+00, 9.76765797e-02],
   [2.00000000e+00, 3.26260189e-02],
   [3.00000000e+00, 2.27720518e-03],
   [4.00000000e+00, 1.61188858e-01],
   [5.00000000e+00, 1.23496687e-01],
   [6.00000000e+00, 2.04377586e-01],
   [7.00000000e+00, 7.47678209e-02],
   [8.00000000e+00, 4.67140951e-02],
   [9.00000000e+00, 1.31659099e-01],
   [1.00000000e+01, 1.25216050e-01]])

What is the fastest way to rebin this histogram, for example by taking a bin size of 2.5?

The resulting array should have 2.5,5.0,7.5,10.0 as first column and the sum of the frequency values in the intervals [0,2.5],(2.5,5.0],(5.0,7.5],(5.0,10.] as second column.

I'm trying to find a compact way to make this transformation but cannot find it.


Edit: As Jakob Stark made me notice, it's not possible to rebin a histogram in general. However it is possible to merge bins. For example, doubling or tripling the bin size. How can one do this in a compact way?

I have updated the question's title to reflect the edit.

Upvotes: 0

Views: 1250

Answers (3)

GGranroth
GGranroth

Reputation: 11

I find using vector math is usually a more efficient way of combining bins. This can be accomplished by using the indexing features of ndarrays in numpy. Here is the above example using striding (start:stop:step).

import numpy as np
ar1 = np. array([[1.00000000e+00, 9.76765797e-02],
                 [2.00000000e+00, 3.26260189e-02],
                 [3.00000000e+00, 2.27720518e-03],
                 [4.00000000e+00, 1.61188858e-01],
                 [5.00000000e+00, 1.23496687e-01],
                 [6.00000000e+00, 2.04377586e-01],
                 [7.00000000e+00, 7.47678209e-02],
                 [8.00000000e+00, 4.67140951e-02],
                 [9.00000000e+00, 1.31659099e-01],
                 [1.00000000e+01, 1.25216050e-01]])
bin_size = 2
weights = ar1[::bin_size,1] + ar1[1::bin_size,1]
bins = (ar1[::bin_size,0] + ar1[1::bin_size,0])/bin_size
new_ar = np.column_stack((bins, weights))

Upvotes: 0

Jakob Stark
Jakob Stark

Reputation: 3845

You cannot rebin a histogram. If you fill data in a histogram, you loose information (thats in fact often the reason why you want histograms). Unless you still have the original data there is no way to get a histogram with a different binning.

If you have the original data, you can of course make a new histogram with the desired binning out of it.

Edit You can merge bins though. So as long as your new bins can be expressed through merged bins (e.g. double the bin size) you can just add the wheights of each contributing bin to the merged bin.

Edit To double the bin size for example you could use

n = 2 # merge 2 bins
bins, weights = old_hist[:,0], old_hist[:,1]
bins = bins.reshape((-1,n))[:,0]
weights = np.sum(weights.reshape((-1,n)), axis=1)
new_hist = np.column_stack((bins,weights))

Upvotes: 2

valerio
valerio

Reputation: 717

In the end, I cam up with this. Not terribly efficient, though, I'm afraid:

data=array([[1.00000000e+00, 9.76765797e-02],
   [2.00000000e+00, 3.26260189e-02],
   [3.00000000e+00, 2.27720518e-03],
   [4.00000000e+00, 1.61188858e-01],
   [5.00000000e+00, 1.23496687e-01],
   [6.00000000e+00, 2.04377586e-01],
   [7.00000000e+00, 7.47678209e-02],
   [8.00000000e+00, 4.67140951e-02],
   [9.00000000e+00, 1.31659099e-01],
   [1.00000000e+01, 1.25216050e-01]])

bin_size=2.

x=data[:,0]
y=data[:,1]     
nbins=max(x)/bin_size
x_merge=asarray([max(a) for a in array_split(x,nbins)])
y_merge=asarray([sum(a) for a in array_split(y,nbins)])
out_array=column_stack((x_merge,y_merge))

Still interested in more efficient/compact ways to do this.

Upvotes: 0

Related Questions