Curious
Curious

Reputation: 3587

python making list of tuples into definite sized intervals

I am having the x-value and corresponding counts in a file. I read that as list of tuples in the following form

dat = [(0.02, 1), 
(0.0211, 1), 
(0.021, 1), 
(0.023, 1), 
(0.0251, 1), 
(0.12, 2), 
(0.141, 1), 
(0.14, 3), 
(0.171, 1), 
(0.462, 9),
(0.467, 10),
(0.478, 15), 
(0.804, 20), 
(0.815, 31), 
(0.815, 24),
(2.72, 164), 
(2.78, 147), 
(2.8, 128),
(5.78, 6), 
(5.83, 1), 
(5.8603, 1),
(5.94, 17), 
(8.63, 3), 
(8.87, 5),  
(18.601, 1), 
(19.0, 7), 
(21.0, 2), 
(22.0, 4)]

How to convert these into equal interval counts. For example, an intervals with 0.2 increments.

x    count
0    0
0.5  12
1.0  75
1.5  0
2.0  0
2.5  0
3.0  439
... 

Upvotes: 2

Views: 464

Answers (3)

K Z
K Z

Reputation: 30483

Here is a way to do it using Python standard lib:

import math

step_size = 0.5
result = {}
i = 0

for intval in [x * step_size for x in range(int(math.ceil(max(dat)[0]*2)+1))]:
    result[intval] = 0
    for n, count in dat[i:]:
        if n > intval:
            break
        result[intval] += count
        i += 1


print sorted(result.items(), key=lambda x:x[0])


[(0.0, 0), (0.5, 46), (1.0, 75), (1.5, 0), (2.0, 0), (2.5, 0), (3.0, 439), (3.5, 0)
, (4.0, 0), (4.5, 0), (5.0, 0), (5.5, 0), (6.0, 25), (6.5, 0), (7.0, 0), (7.5, 0),
(8.0, 0), (8.5, 0), (9.0, 8), (9.5, 0), (10.0, 0), (10.5, 0), (11.0, 0), (11.5, 0),
 (12.0, 0), (12.5, 0), (13.0, 0), (13.5, 0), (14.0, 0), (14.5, 0), (15.0, 0), (15.5
, 0), (16.0, 0), (16.5, 0), (17.0, 0), (17.5, 0), (18.0, 0), (18.5, 0), (19.0, 8),
(19.5, 0), (20.0, 0), (20.5, 0), (21.0, 2), (21.5, 0), (22.0, 4)]

Upvotes: 0

Wouter Overmeire
Wouter Overmeire

Reputation: 69236

An approach with pandas:

In [74]: df = pd.DataFrame.from_records(dat).set_index(0)

In [75]: counts = df.groupby(lambda x: floor(x / 0.5) * 0.5).count()

In [76]: counts
Out[76]: 
       1
0.0   12
0.5    3
2.5    3
5.5    4
8.5    2
18.5   1
19.0   1
21.0   1
22.0   1

You can fill the intervals with zero counts:

In [77]: counts.reindex(np.arange(0, 22, 0.5)).fillna(0)
Out[73]: 
       1
0.0   12
0.5    3
1.0    0
1.5    0
2.0    0
2.5    3
3.0    0
3.5    0
4.0    0

etc ...

Upvotes: 3

Barney Szabolcs
Barney Szabolcs

Reputation: 12544

Here is a reasonable solution, with bin upper limits stored in bins.

import numpy as np
min_bin_upper=0
max_bin_upper=100
bin_step=0.5

bins = np.arange(min_bin_upper,max_bin_upper,bin_step)
counts = np.zeros(len(bins))
i=0
for e in data:
    if e[0]>= bins[i]: i+=1
    if i>=len(bins): break
    counts[i]+=e[1]

print counts

I have tested it with

data = [(0.1, 3), (0.2, 1),(0.3, 10)]
min_bin_upper = 0
max_bin_upper = 1
bin_step = 0.2

It returned

[  0.   3.  11.   0.   0.]

I hope this is what you need.

Upvotes: 1

Related Questions