cf2
cf2

Reputation: 591

Compare two lists of unequal length and non-matching values

I want to compare two lists containing matching integers of different length. The goal is to make them the same length by removing items from the longer list based on missing values from the shorter list. The lists:

list1 = [101, 201, 301, 402, 502, 603, 701, 802, 904, 10012, 10021, 10033, 10041, 10054, 10062, 10071, 10082, 10093, 10101]
list2 = [102, 203, 504, 601, 703, 901, 10013, 10071, 10082, 10093, 10103]

However the matching values are not exactly the same for both list and can vary between 0 and 3 in this example.

The result would look like this:

resultlist1 = [101, 201, 502, 603, 701, 904, 10012, 10073, 10082, 10093, 10101]
resultlist2 = [102, 203, 504, 601, 703, 901, 10013, 10071, 10082, 10093, 10103]

removed_items_list1 = [2, 3, 7, 10, 11, 12, 13, 14]  # Index numbers of 

I tried the following without success

set(list1).intersection(list2)

Only returns exact matches

for i in xrange(len(list2)):
    if abs(list1[i] - list2[i]) > 3:
        del list1[i]

Does not remove all unwanted values

How would I compare these two lists with unequal length and remove non-matches (within a certain variation) in the longer list?

Upvotes: 3

Views: 4373

Answers (2)

jme
jme

Reputation: 20695

Here is a solution that takes linear time; the others take quadratic time, though that may be just fine if your input is small.

def align(shorter, longer, margin=3):    
    result = []
    removed = []

    longer = enumerate(longer)

    for target in shorter:
        while True:
            index, current = next(longer)
            if abs(current - target) <= margin:
                result.append(current)
                break
            else:
                removed.append(index)

    return result, removed

This assumes that you can always align the lists as in your example. If this isn't true you'll need to add some error checking to the above.

Example:

>>> align(list2, list1)
([101, 201, 502, 603, 701, 904, 10012, 10071, 10082, 10093, 10101],
 [2, 3, 7, 10, 11, 12, 13, 14])

Upvotes: 1

dnalow
dnalow

Reputation: 984

You can use numpy array comparison:

list1 = [101, 201, 301, 402, 502, 603, 701, 802, 904, 10012, 10021, 10033, 10041, 10054, 10062, 10071, 10082, 10093, 10101]
list2 = [102, 203, 504, 601, 703, 901, 10013, 10071, 10082, 10093, 10103]
import numpy as np
l1 = np.array(list1)
l2 = np.array(list2)

ind = abs(l1 - l2[:,None]) <= 3

print l1[ind.max(0)]
print l2[ind.max(1)]
print ind.max(1)
print ind.max(0)
print np.where(~(ind.max(0)))

results in

[  101   201   502   603   701   904 10012 10071 10082 10093 10101]

[  102   203   504   601   703   901 10013 10071 10082 10093 10103]

[ True  True  True  True  True  True  True  True  True  True  True]

[ True  True False False  True  True  True False  True  True False False
 False False False  True  True  True  True]

(array([ 2,  3,  7, 10, 11, 12, 13, 14]),)

Upvotes: 0

Related Questions