Reputation: 45
I'm kinda new to programming and I want to compare two lists of lists in python, while the floats in these lists may have an error in it. Here an example:
first_list = [['ATOM', 'N', 'SER', -1.081, -16.465, 17.224],
['ATOM', 'C', 'SER', 2.805, -3.504, 6.222],
['ATOM', 'O', 'SER', -17.749, 16.241, -1.333]]
secnd_list = [['ATOM', 'N', 'SER', -1.082, -16.465, 17.227],
['ATOM', 'C', 'SER', 2.142, -3.914, 6.222],
['ATOM', 'O', 'SER', -17.541, -16.241, -1.334]]
Expected Output:
Differences = ['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]
So far my tryings:
def aprox (x, y):
if x == float and y == float:
delta = 0.2 >= abs(x - y)
return delta
else: rest = x, y
return rest
def compare (data1, data2):
diff = [x for x,y in first_list if x not in secnd_list and aprox(x,y)] + [x for x,y in secnd_list if x not in first_list and aprox(x,y)]
return diff
Or with the help of tuples, but there I dont know how to build in the approximation:
def compare (data1, data2):
first_set = set(map(tuple, data1))
secnd_set = set(map(tuple, data2))
diff = first_set.symmetric_difference(secnd_set)
return diff
Hope you can help me! :)
Upvotes: 0
Views: 177
Reputation: 566
The line
if x == float and y == float
is inaccurate...
The proper way to check the type of the variable is to use the type()
function...
Try replacing the above line with
if type(x) is float and type(y) is float:
Upvotes: 5
Reputation: 18218
May be you can iterate through each element in of both and followed by comparison of sub-elements:
Then, when any sub elements not equal, it can be added to results depending on it's type i.e. if two
strings are not equal, it can be added to results or if it is float and math.isclose()
can be used for approximation:
Note: Correction was made to match the expected output, there is missing negative sign in third element of first_list
import math
first_list = [['ATOM', 'N', 'SER', -1.081, -16.465, 17.224],
['ATOM', 'C', 'SER', 2.805, -3.504, 6.222],
['ATOM', 'O', 'SER', -17.749, -16.241, -1.333]] # changes made
secnd_list = [['ATOM', 'N', 'SER', -1.082, -16.465, 17.227],
['ATOM', 'C', 'SER', 2.142, -3.914, 6.222],
['ATOM', 'O', 'SER', -17.541, -16.241, -1.334]]
diff = []
for e1, e2 in zip(first_list, secnd_list):
for e_sub1, e_sub2 in zip(e1, e2):
# if sub-elements are not equal
if e_sub1 != e_sub2:
# if it is string and not equal
if isinstance(e_sub1, str):
diff.append(e1)
break # one element not equal so no need to iterate other sub-elements
else: # is float and not equal
# Comparison made to 0.2
if not math.isclose(e_sub1, e_sub2, rel_tol=2e-1):
diff.append(e1)
break # one element not equal so no need to iterate other sub-elements
diff
Output:
[['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]]
Upvotes: 0
Reputation: 14236
This is kind of clunky but I did it on the fly and it should get you the desired results. As I mentioned in your code you set the threshold at 0.2
which means two rows should be returned, not one like you mentioned.
def discrepancies(x, y):
for _, (row1, row2) in enumerate(zip(x, y)):
for _, (item1, item2) in enumerate(zip(row1[3:],row2[3:])):
if abs(item1 - item2) >= 0.2:
print row1
break
discrepancies(first_list, secnd_list)
['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]
['ATOM', 'O', 'SER', -17.749, 16.241, -1.333]
Couple caveats, this will get considerably slower as each for loop adds O(n) and for larger lists within your lists I would use the itertools.izip
function I believe it is called. Hope this helps!
Upvotes: 0