earnric
earnric

Reputation: 534

Normalize histogram2d by bin area

I have a 2D histogram that I generate with numpy:

H, xedges, yedges = np.histogram2d(y, x, weights=mass * (1.0 - pf),
                                   bins=(yrange,xrange))

Note that I'm currently weighing the bins with a function of mass (mass is a numpy array with the same dimensions as x and y). The bins are logarithmic (generated via xrange = np.logspace(minX, maxX, 100)).

However, I really want to weight the bins by the mass function but normalize them (i.e. divide) each by the area of the bin: e.g. - each bin has area xrange[i] * yrange[i]. However, since xrange and yrange don't have the same dimensions as mass, x and y ... I can't simply put the normalization in the np.histogram2d call.

How can I normalize the bin counts by the area in each log bin?

For reference, here's the plot (I've added x and y 1D histograms that I'll also need to normalize by the width of the bin, but once I figure out how to do it for 2D it should be analogous).

FYI - I generate the main (and axes-histograms) with matplotlib:

X,Y=np.meshgrid(xrange,yrange)
H = np.log10(H)
masked_array = np.ma.array(H, mask=np.isnan(H))  # mask out all nan, i.e. log10(0.0)
cax = (ax2dhist.pcolormesh(X,Y,masked_array, cmap=cmap, norm=LogNorm(vmin=1,vmax=8)))

Phase diagram relating star particle metals to the fraction of primordial metals

Upvotes: 2

Views: 2659

Answers (1)

ali_m
ali_m

Reputation: 74152

I think you just want to pass normed=True to np.histogram2d:

normed: bool, optional

If False, returns the number of samples in each bin. If True, returns the bin density bin_count / sample_count / bin_area.


If you wanted to compute the bin areas and do the normalization manually , the simplest way would probably be to use broadcasting:

x, y = np.random.rand(2, 1000)
xbin = np.logspace(-1, 0, 101)
ybin = np.logspace(-1, 0, 201)

# raw bin counts
counts, xe, ye = np.histogram2d(x, y, [xbin, ybin])

# size of each bin in x and y dimensions
dx = np.diff(xbin)
dy = np.diff(ybin)

# compute the area of each bin using broadcasting
area = dx[:,  None] * dy

# normalized counts
manual_norm = counts / area / x.shape[0]

# using normed=True
counts_norm, xe, ye = np.histogram2d(x, y, [xbin, ybin], normed=True)

print(np.allclose(manual_norm, counts_norm))
# True

Upvotes: 2

Related Questions