Reputation: 65
I have a large list of numbers, and I want to see if any of them are approximately equal. If 2 numbers are "approximately equal" (for my purposes), both of them fall within 10% of each other (see the following 2 examples.) Then I want to sort them into separate lists of approximately equal numbers.
Example #1 Compare 5.0 and 5.5: 5.5 +/- 10% = 4.95 to 6.05 (and 5.0 is in this range) 5.0 +/- 10% = 4.50 to 5.50 (and 5.5 is in this range) Therefore, 5.0 and 5.5 are approximately equal.
Example #2 Compare 5.0 and 5.6: 5.6 +/- 10% = 5.04 to 6.16 (and 5.0 is in this range) 5.0 +/- 10% = 4.50 to 5.50 (and 5.6 is in NOT this range) Therefore, 5.0 and 5.6 are NOT approximately equal.
Summary of what I need to do: Input = {4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3} Desired output = {4.0, 4.1, 4.2, 4.0, 4.3} and {9.0, 9.4, 8.9}
Upvotes: 4
Views: 5078
Reputation: 11686
Here is a generator / set based method.
def set_gen(nums):
for seed in sorted(nums):
yield tuple([n for n in nums if seed <= n and n/seed <= 1.1])
def remove_subsets(sets):
for s in sets.copy():
[sets.remove(s2) for s2 in sets.difference([s]) if set(s2).issubset(s)]
>>> nums = [4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3]
>>> x = set(num for num in set_gen(nums))
>>> remove_subsets(x)
>>> list(x)
[(9.0, 9.4, 8.9), (4.0, 4.1, 4.2, 4.0, 4.3)]
>>> nums = [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
>>> x = set(num for num in set_gen(nums))
>>> remove_subsets(x)
>>> list(x)
[(1.9, 1.8), (1.5, 1.4), (1.4, 1.3), (1.2, 1.1), (1.7, 1.6), (1.5, 1.6), (1.3, 1.2), (1.9, 2.0), (1.0, 1.1), (1.8, 1.7)]
Upvotes: 0
Reputation: 103834
You could do this:
Input = {4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3}
wl=sorted(Input,reverse=True)
apr=.1
out={}
while wl:
wn=wl.pop()
out[wn]=[wn]
while wl and wl[-1]<=wn*(1+apr):
out[wn].append(wl.pop())
print [(k,out[k]) for k in sorted(out.keys())]
Prints:
[(4.0, [4.0, 4.1, 4.2, 4.3]), (8.9, [8.9, 9.0, 9.4])]
Trying the example in the comments:
>>> Input = {1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0}
Prints:
[(1.0, [1.0, 1.1]), (1.2, [1.2, 1.3]), (1.4, [1.4, 1.5]), (1.6, [1.6, 1.7]), (1.8, [1.8, 1.9]), (2.0, [2.0])]
Upvotes: -1
Reputation: 2962
input_list = [4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3]
results = {input_list[0]: [input_list[0]]} # Start with first value
for value in input_list[1:]: # loop through our entire list after first value
hi = value * 1.1
low = value * 0.9
print("Value: {0}\tHi: {1}\tLow:{2}".format(value, hi, low))
for existing in results: # search through our result set
found_similar = False
if low < existing < hi: # if we find a match
results[existing].append(value) # we add our value to the list for that set
found_similar = True
break
if not found_similar: # if we looped through our entire results without a match
results[value] = [value] # Create a new entry in our results dictionary
for entry in results:
print(results[entry])
Will give:
results = { 9.0: [9.0, 9.4, 8.9],
4.0: [4.0, 4.1, 4.2, 4.0, 4.3] }
This code starts with the first value in your list, and finds all subsequent values that are within 10% of that one. So in your example, it starts with 4, and finds all similar values. Any value that isn't within 10 % get added to a new "set".
So once it reaches 9.0, it sees that it's not a match, so it adds a new result set to the results
dictionary, with a key of 9.0
. Now when it considers 9.4, it doesn't find a match in the 4.0 list, but it does find a match in the 9.0 list. So it adds this value to the second result set.
Upvotes: 3