Reputation: 643
I have set the parameters of my bins and I want to find how to add one to the bin when a data point falls in the range of a particular bin, essentially count how many data points fall in each bin range, so that I can use that as the "frequency" when I graph it out.
My bins ranges are set by:
bins = [(i*bin_width, (i+1)*bin_width) for i in range(num_bins)]
and my data looks something like:
2.55619101399
2.55619101399
2.55619101399
3.615
4.42745271008
2.55619101399
2.55619101399
2.55619101399
4.42745271008
3.615
2.55619101399
4.42745271008
5.71581687075
5.71581687075
3.615
2.55619101399
2.55619101399
2.55619101399
2.55619101399
2.55619101399
Upvotes: 3
Views: 20986
Reputation: 365657
Since you're using NumPy, you (a) shouldn't be trying to create lists and loop over them instead of using arrays, and (b) should look to see if what you want to do is already built-in (or available in SciPy or Pandas or some other library built on NumPy), because often it is.
And numpy.histogram
is exactly what you want.
It takes a total width rather than a bin width, but other than that, it's trivial to plug in the values you already have and get back the values you want:
hist, edges = np.histogram(
data_points,
bins=num_bins,
range=(0, bin_width*num_bins),
density=False)
The hist
array will contain the counts for each bin (like bin_counts
in my other answer), which is what you want to post-process and eventually graph.
The edges
, you may or may not need. It's the same information as the bins
in your original question, but in different format—instead of [(0, .1), (.1, .2), (.2, .3)]
it's [0, .1, .2, .3]
.
Upvotes: 6
Reputation: 23
from collections import Counter
frequency_data = Counter()
for d in data:
new_bins = bins
median = len(new_bins)/2
while not new_bins[median][0] < d < new_bins[median][1]:
if d < new_bins[median][0]:
new_bins = new_bins[:median]
elif d > new_bins[median][1]:
new_bins = new_bins[median:]
median = len(new_bins)/2
frequency_data[new_bins[median]] += 1
Upvotes: 0
Reputation: 365657
Well, first, each of your bins
is just a tuple of the start and end values of that bin, so there's no way to add anything to it. You could change each bin
into, say, list of [start, stop, 0]
instead of a tuple of (start, stop)
, or, maybe even better, an object. Or, alternatively, you could keep a separate bin_counts
list, parallel to the bins
list, and, e.g., zip
them up when needed.
Next, if each bin goes from i * bin_width
to (i+1) * bin_width
, then how do you get the i
value from a data value? That's easy: the opposite of multiply is divide, so it's just data_point // bin_width
.
So:
bin_counts = [0 for bin in bins]
for data_point in data_points:
bin_number = data_point // bin_width
bin_counts[bin_number] += 1
Showing one of the other options, because I think you were asking about it in the comments:
bins = [[i*bin_width, (i+1)*bin_width, 0] for i in range(num_bins)]
for data_point in data_points:
bin_number = data_point // bin_width
bins[bin_number][2] += 1
Here, each bin is a list of [start, stop, count]
, instead of having a list of (start, stop)
bins and a separate list of count
values.
Upvotes: 3