Merging python lists based on a 'similar' float value

Question

I have a list (containing tuples) and I want to merge the list based on if the first element is within a maximum distance of the other elements (if if delta value < 0.05). I have the following list as an example:

[(0.0, 0.9811758192941256), (1.00422, 0.9998252466431066), (0.0, 0.9024831978342827), (2.00425, 0.9951777494430947)]

This should yield something like:

[(0.0, 1.883659017),(1.00422, 0.9998252466431066),(2.00425,0.9951777494430947)]

I am thinking that I can use something similar as in this question (Merge nested list items based on a repeating value) altho a lot of other questions yield a similar answer. The only problem that I see there is that they use collections.defaultdict or itertools.groupby which require exact matching of the element. An important addition here is that I want the first element of a merged tuple to be the weighted mixture of elements, example as follows:

(1.001,80) and (0.99,20) are matched then the result should be (0.9988,100).

Is something similar possible but with the matching based on value difference and not exact match?

What I was trying myself (but don't really like the look of it) is:

Res = 0.05
combinations = itertools.combination(list,2)
for i in combinations:
  if i[0][0] > i[1][0]-Res and i[0][0] < i[1][0]+Res:
    newValue = ...

-- UPDATE --

Based on some comments and Dawgs answer I tried the following approach:

for fv, v in total:
    k=round(fv, 2)
    data[k]=data.get(k, 0)+v

using the following list (actual data example, instead of short example list):

total = [(0.0, 0.11630591852564721), (1.00335, 0.25158664272201053), (2.0067, 0.2707487305913156), (3.0100499999999997, 0.19327075057473678), (4.0134, 0.10295042331357719), (5.01675, 0.04364856520231155), (6.020099999999999, 0.015342958201863783), (0.0, 0.9811758192941256), (1.00422, 0.018649427348981), (0.0, 0.9024831978342827), (2.00425, 0.09269455160881204), (0.0, 0.6944298762418107), (0.99703, 0.2536959281304138), (1.99406, 0.045877927988415786)]

which then yields problems with values such as 2.0067 (rounded to 2.01) and 1.99406 (rounded to 1.99( where the total difference is 0.01264 (which is far below 0.05, a value that I had in mind as a 'limit' for now but that should set changeable). Rounding the values to 1 decimal place is also not an option since that would result in a window of ~0.09 with values such as 2.04999 and 1.95001 which both yield 2.0 in that case.

The exact output was:

{0.0: 2.694394811895866, 1.0: 0.5239319982014053, 4.01: 0.10295042331357719, 5.02: 0.04364856520231155, 2.0: 0.09269455160881204, 1.99: 0.045877927988415786, 3.01: 0.19327075057473678, 6.02: 0.015342958201863783, 2.01: 0.2707487305913156}

Adam Smith · Accepted Answer

accum = list()
data = [(0.0, 0.9811758192941256), (1.00422, 0.9998252466431066), (0.0, 0.9024831978342827), (2.00425, 0.9951777494430947)]

EPSILON = 0.05

newdata = {d: True for d in data}
for k, v in data:
    if not newdata[(k,v)]: continue
    newdata[(k,v)] = False
    # use each piece of data only once
    keys,values = [k*v],[v]
    for kk, vv in [d for d in data if newdata[d]]:
        if abs(k-kk) < EPSILON:
            keys.append(kk*vv)
            values.append(vv)
            newdata[(kk,vv)] = False
    accum.append((sum(keys)/sum(values),sum(values)))

Merging python lists based on a 'similar' float value

Answers (2)

Related Questions

Merging python lists based on a &#39;similar&#39; float value

Answers (2)

Related Questions

Merging python lists based on a 'similar' float value