Reputation: 3587
I am having the x-value and corresponding counts in a file. I read that as list of tuples in the following form
dat = [(0.02, 1),
(0.0211, 1),
(0.021, 1),
(0.023, 1),
(0.0251, 1),
(0.12, 2),
(0.141, 1),
(0.14, 3),
(0.171, 1),
(0.462, 9),
(0.467, 10),
(0.478, 15),
(0.804, 20),
(0.815, 31),
(0.815, 24),
(2.72, 164),
(2.78, 147),
(2.8, 128),
(5.78, 6),
(5.83, 1),
(5.8603, 1),
(5.94, 17),
(8.63, 3),
(8.87, 5),
(18.601, 1),
(19.0, 7),
(21.0, 2),
(22.0, 4)]
How to convert these into equal interval counts. For example, an intervals with 0.2 increments.
x count
0 0
0.5 12
1.0 75
1.5 0
2.0 0
2.5 0
3.0 439
...
Upvotes: 2
Views: 464
Reputation: 30483
Here is a way to do it using Python standard lib:
import math
step_size = 0.5
result = {}
i = 0
for intval in [x * step_size for x in range(int(math.ceil(max(dat)[0]*2)+1))]:
result[intval] = 0
for n, count in dat[i:]:
if n > intval:
break
result[intval] += count
i += 1
print sorted(result.items(), key=lambda x:x[0])
[(0.0, 0), (0.5, 46), (1.0, 75), (1.5, 0), (2.0, 0), (2.5, 0), (3.0, 439), (3.5, 0)
, (4.0, 0), (4.5, 0), (5.0, 0), (5.5, 0), (6.0, 25), (6.5, 0), (7.0, 0), (7.5, 0),
(8.0, 0), (8.5, 0), (9.0, 8), (9.5, 0), (10.0, 0), (10.5, 0), (11.0, 0), (11.5, 0),
(12.0, 0), (12.5, 0), (13.0, 0), (13.5, 0), (14.0, 0), (14.5, 0), (15.0, 0), (15.5
, 0), (16.0, 0), (16.5, 0), (17.0, 0), (17.5, 0), (18.0, 0), (18.5, 0), (19.0, 8),
(19.5, 0), (20.0, 0), (20.5, 0), (21.0, 2), (21.5, 0), (22.0, 4)]
Upvotes: 0
Reputation: 69236
An approach with pandas:
In [74]: df = pd.DataFrame.from_records(dat).set_index(0)
In [75]: counts = df.groupby(lambda x: floor(x / 0.5) * 0.5).count()
In [76]: counts
Out[76]:
1
0.0 12
0.5 3
2.5 3
5.5 4
8.5 2
18.5 1
19.0 1
21.0 1
22.0 1
You can fill the intervals with zero counts:
In [77]: counts.reindex(np.arange(0, 22, 0.5)).fillna(0)
Out[73]:
1
0.0 12
0.5 3
1.0 0
1.5 0
2.0 0
2.5 3
3.0 0
3.5 0
4.0 0
etc ...
Upvotes: 3
Reputation: 12544
Here is a reasonable solution, with bin upper limits stored in bins
.
import numpy as np
min_bin_upper=0
max_bin_upper=100
bin_step=0.5
bins = np.arange(min_bin_upper,max_bin_upper,bin_step)
counts = np.zeros(len(bins))
i=0
for e in data:
if e[0]>= bins[i]: i+=1
if i>=len(bins): break
counts[i]+=e[1]
print counts
I have tested it with
data = [(0.1, 3), (0.2, 1),(0.3, 10)]
min_bin_upper = 0
max_bin_upper = 1
bin_step = 0.2
It returned
[ 0. 3. 11. 0. 0.]
I hope this is what you need.
Upvotes: 1