Bugsy
Bugsy

Reputation: 45

Python - Comparison of list of lists with approximating floats

I'm kinda new to programming and I want to compare two lists of lists in python, while the floats in these lists may have an error in it. Here an example:

first_list = [['ATOM', 'N', 'SER', -1.081, -16.465,  17.224], 
              ['ATOM', 'C', 'SER', 2.805, -3.504,  6.222], 
              ['ATOM', 'O', 'SER', -17.749, 16.241,  -1.333]]

secnd_list = [['ATOM', 'N', 'SER', -1.082, -16.465,  17.227],
              ['ATOM', 'C', 'SER', 2.142, -3.914,  6.222], 
              ['ATOM', 'O', 'SER', -17.541, -16.241,  -1.334]]

Expected Output:

Differences = ['ATOM', 'C', 'SER', 2.805, -3.504,  6.222]

So far my tryings:

def aprox (x, y):
    if x == float and y == float:
        delta = 0.2 >= abs(x - y)
        return delta
    else: rest = x, y
    return rest

def compare (data1, data2):
    diff = [x for x,y in first_list if x not in secnd_list and aprox(x,y)] + [x for x,y in secnd_list if x not in first_list and aprox(x,y)]
    return diff

Or with the help of tuples, but there I dont know how to build in the approximation:

def compare (data1, data2):
    first_set = set(map(tuple, data1))
    secnd_set = set(map(tuple, data2))
    diff = first_set.symmetric_difference(secnd_set)
    return diff

Hope you can help me! :)

Upvotes: 0

Views: 177

Answers (3)

Sanjay SS
Sanjay SS

Reputation: 566

The line

if x == float and y == float

is inaccurate... The proper way to check the type of the variable is to use the type() function... Try replacing the above line with

if type(x) is float and type(y) is float:

Upvotes: 5

niraj
niraj

Reputation: 18218

May be you can iterate through each element in of both and followed by comparison of sub-elements: Then, when any sub elements not equal, it can be added to results depending on it's type i.e. if two strings are not equal, it can be added to results or if it is float and math.isclose() can be used for approximation:

Note: Correction was made to match the expected output, there is missing negative sign in third element of first_list

import math

first_list = [['ATOM', 'N', 'SER', -1.081, -16.465,  17.224], 
              ['ATOM', 'C', 'SER', 2.805, -3.504,  6.222], 
              ['ATOM', 'O', 'SER', -17.749, -16.241,  -1.333]] # changes made

secnd_list = [['ATOM', 'N', 'SER', -1.082, -16.465,  17.227],
              ['ATOM', 'C', 'SER', 2.142, -3.914,  6.222], 
              ['ATOM', 'O', 'SER', -17.541, -16.241,  -1.334]]

diff = []
for e1, e2 in zip(first_list, secnd_list):
    for e_sub1, e_sub2 in zip(e1, e2):
        # if sub-elements are not equal
        if e_sub1 != e_sub2:
            # if it is string and not equal
            if isinstance(e_sub1, str):
                diff.append(e1)
                break # one element not equal so no need to iterate other sub-elements
            else:  # is float and not equal
                # Comparison made to 0.2
                if not math.isclose(e_sub1, e_sub2, rel_tol=2e-1):
                    diff.append(e1)
                    break # one element not equal so no need to iterate other sub-elements
diff

Output:

[['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]]

Upvotes: 0

gold_cy
gold_cy

Reputation: 14236

This is kind of clunky but I did it on the fly and it should get you the desired results. As I mentioned in your code you set the threshold at 0.2 which means two rows should be returned, not one like you mentioned.

def discrepancies(x, y):
    for _, (row1, row2) in enumerate(zip(x, y)):
        for _, (item1, item2) in enumerate(zip(row1[3:],row2[3:])):
            if abs(item1 - item2) >= 0.2:
                print row1
                break

discrepancies(first_list, secnd_list)
['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]
['ATOM', 'O', 'SER', -17.749, 16.241, -1.333]

Couple caveats, this will get considerably slower as each for loop adds O(n) and for larger lists within your lists I would use the itertools.izip function I believe it is called. Hope this helps!

Upvotes: 0

Related Questions