kamacite
kamacite

Reputation: 11

Extracting data from a histogram with custom bins in Python

I have a data set of distances between two particles, and I want to bin these data in custom bins. For example, I want to see how many distance values lay in the interval from 1 to 2 micrometers, and so on. I wrote a code about it, and it seems to work. This is my code for this part:

#Custom binning of data

bins= [0,1,2,3,4,5,6,7,8,9,10]
fig, ax = plt.subplots(n,m,figsize = (30,10)) #using this because I actually have 5 histograms, but only posted one here
ax.hist(dist_from_spacer1, bins=bins, edgecolor="k")
ax.set_xlabel('Distance from spacer 1 [µm]')
ax.set_ylabel('counts')
plt.xticks(bins)
plt.show()

However, now I wish to extract those data values from the intervals, and store them into lists. I tried to use:

np.histogram(dist_from_spacer1, bins=bins)

However, this just gives how many data points are on each bin and the bin intervals, just like this:

(array([  0,   0,  44, 567, 481, 279, 309, 202, 117,   0]),
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]))

How can I get the exact data that belong to each histogram bin?

Upvotes: 0

Views: 1465

Answers (1)

ronpi
ronpi

Reputation: 490

Yes, np.histogram calculates what you need for a histogram, and hence the specific data points are not necessary, just bins' boundaries and count for each bin. However, the bins' boundaries is sufficient to acheive what you want by using np.digitizr

counts, bins = np.histogram(dist_from_spacer1)
indices = np.digitize(dist_from_spacer1, bins)
lists = [[] for _ in range(len(bins))]
[lists[i].append(x) for i, x in zip(indices, dist_from_spacer1)

In your case, the bins' boundaries are predefined, so you can use np.digitize directly

Upvotes: 1

Related Questions