Reputation: 13145
I have two dimentional data stored in a sorted list of tuples, as follows:
data = [(0.1,100), (0.13,300), (0.2,10)...
The first value in each tuple, the X value, only occurs once for the list of tuples. In other words, there can only be one value for 0.1 etc.
I then have a sorted list of buckets. A bucket is defined as a tuple containing a range and an id, as follows:
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)...
A range is with respect to the X axis. So, the id 2 has two buckets above and ids 1 and 3 have only one, respectively. The first bucket for id 2 has a range from 0 to 0.14. Please note that buckets can overlap.
So, I need an algorithm which drops the data into the buckets and then adds up the scores. For the data above, the result would be:
1:0
2:410
3:10
Notice how each piece of data is caught by a bucket associated with the id 2, hence it gets the score 100+300+10=410
.
How might I write an algorithm to do this?
Upvotes: 4
Views: 2281
Reputation: 22619
Turn each bucket definition (label range) into a callable that - given the data tuple - will increment the bucket total. Bucket values are stored in a simple dict. You can easily wrap this concept up in a class if you want to provide a simpler api.
def partition(buckets, bucket_definition):
"""Build a callable that increments the appropriate buckets with a value"""
lower, upper = bucket_definition[0]
key = bucket_definition[1]
def _partition(data):
x, y = data
# Set a default value for this key
buckets.setdefault(key, 0)
if lower <= x <= upper:
buckets[key] += y
return _partition
bucket_definitions = [
((0, 0.14), 2),
((0.135, 0.19), 1),
((0.19, 0.21), 2),
((0.19, 0.24), 3)
]
data = [(0.1, 100), (0.13, 300), (0.2, 10)]
# Holder for bucket labels and values
buckets = {}
# For each bucket definition (range, label) build a callable
partitioners = [partition(buckets, definition) for definition in bucket_definitions]
# Map each callable to each data tuple provided
for partitioner in partitioners:
map(partitioner, data)
print(buckets)
Upvotes: 1
Reputation: 4056
This produces the desired output from your test data:
data = [(0.1,100), (0.13,300), (0.2,10)]
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)]
totals = dict()
for bucket in buckets:
bucket_id = bucket[1]
if bucket_id not in totals:
totals[bucket_id] = 0
for data_point in data:
if data_point[0] >= bucket[0][0] and data_point[0] <= bucket[0][1]:
totals[bucket_id] += data_point[1]
for key in sorted(totals):
print("{}: {}".format(key, totals[key]))
Upvotes: 1
Reputation: 672
try this code:
data = [(0.1,100), (0.13,300), (0.2,10)]
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)]
def foo(tpl): ## determine the buckets a data-tuple is enclosed by list of IDs
x, s = tpl
lst = []
for bucket in buckets:
rnge, iid = bucket
if x>rnge[0] and x<rnge[1]: lst.append(iid)
return lst
data = [[dt, foo(dt)] for dt in data]
scores_dict = {}
for tpl in data:
score = tpl[0][1]
for iid in tpl[1]:
if iid in scores_dict: scores_dict[iid]+=score
else: scores_dict[iid] =score
for key in scores_dict:
print key,":",scores_dict[key]
This snippet results in:
2 : 410
3 : 10
If any bucket ID is not printed, there is no X value in that bucket or it sums zero.
Upvotes: 1