Reputation: 11
I have a data set of distances between two particles, and I want to bin these data in custom bins. For example, I want to see how many distance values lay in the interval from 1 to 2 micrometers, and so on. I wrote a code about it, and it seems to work. This is my code for this part:
#Custom binning of data
bins= [0,1,2,3,4,5,6,7,8,9,10]
fig, ax = plt.subplots(n,m,figsize = (30,10)) #using this because I actually have 5 histograms, but only posted one here
ax.hist(dist_from_spacer1, bins=bins, edgecolor="k")
ax.set_xlabel('Distance from spacer 1 [µm]')
ax.set_ylabel('counts')
plt.xticks(bins)
plt.show()
However, now I wish to extract those data values from the intervals, and store them into lists. I tried to use:
np.histogram(dist_from_spacer1, bins=bins)
However, this just gives how many data points are on each bin and the bin intervals, just like this:
(array([ 0, 0, 44, 567, 481, 279, 309, 202, 117, 0]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]))
How can I get the exact data that belong to each histogram bin?
Upvotes: 0
Views: 1465
Reputation: 490
Yes, np.histogram
calculates what you need for a histogram, and hence the specific data points are not necessary, just bins' boundaries and count for each bin. However, the bins' boundaries is sufficient to acheive what you want by using np.digitizr
counts, bins = np.histogram(dist_from_spacer1)
indices = np.digitize(dist_from_spacer1, bins)
lists = [[] for _ in range(len(bins))]
[lists[i].append(x) for i, x in zip(indices, dist_from_spacer1)
In your case, the bins' boundaries are predefined, so you can use np.digitize
directly
Upvotes: 1