Reputation: 33223
I have data as following [0.1,0.2,1,5,100] and so on... What i want to do is count number of items between
1-10
11-20
21-30
... and so on...
Right now, I have a very messy code..
What I have done is mapped
1-10 :=> 0
11-20:=> 1
..and on..
So I have defined buckets where bucket 0 has range 1-10, bucket 1 has range 11-20 and so on.
And the code is:
for ele in data:
bucket_id = get_bucket_id(ele)
freq_dict[bucket_id] +=1
get_bucket_id
is a big if else
code..
Is there a better way to do this?
Upvotes: 2
Views: 2371
Reputation: 22882
You could use numpy.histogram
, which tabulates the frequencies at which elements in your data appear in a set of intervals (bins). It returns the counts in each bin and the rightmost edge of each bin:
>>> import numpy as np
>>> data = [0.1,0.2,1,5,100]
>>> hist, bin_edges = np.histogram( data )
>>> hist
array([4, 0, 0, 0, 0, 0, 0, 0, 0, 1])
>>> bin_edges
array([ 0.1 , 10.09, 20.08, 30.07, 40.06, 50.05, 60.04,
70.03, 80.02, 90.01, 100. ])
Upvotes: 5
Reputation: 250891
You can use collections.Counter
and bisect
module here:
>>> from bisect import bisect_left
>>> lis = range(0, 101, 10)
>>> l = [0.1, 0.2, 1, 5, 100, 11]
>>> c = Counter(bisect_left(lis, item) for item in l)
>>> c
Counter({1: 4, 10: 1, 2: 1})
>>> [c[i] for i in xrange(1, 11)]
[4, 1, 0, 0, 0, 0, 0, 0, 0, 1]
Upvotes: 1
Reputation: 1429
You could use len
and filter
:
c = []
for l, u in [(1, 10), (11, 20), (21, 30)]: # ...
c.append(len(filter(lambda x: l <= x <= u, values)))
Upvotes: 1
Reputation: 363517
Use a Counter
and compute the bucket using integer division.
from collections import Counter
freq = Counter()
for x in data:
freq[(x - 1) // 10] += 1
Note that this maps values less than one to -1
. When dealing with not-strictly positive data, you'll actually want to use ranges 1-9, 10-19, etc.
Upvotes: 6