Reputation: 591
I want to compare two lists containing matching integers of different length. The goal is to make them the same length by removing items from the longer list based on missing values from the shorter list. The lists:
list1 = [101, 201, 301, 402, 502, 603, 701, 802, 904, 10012, 10021, 10033, 10041, 10054, 10062, 10071, 10082, 10093, 10101]
list2 = [102, 203, 504, 601, 703, 901, 10013, 10071, 10082, 10093, 10103]
However the matching values are not exactly the same for both list and can vary between 0 and 3 in this example.
The result would look like this:
resultlist1 = [101, 201, 502, 603, 701, 904, 10012, 10073, 10082, 10093, 10101]
resultlist2 = [102, 203, 504, 601, 703, 901, 10013, 10071, 10082, 10093, 10103]
removed_items_list1 = [2, 3, 7, 10, 11, 12, 13, 14] # Index numbers of
I tried the following without success
set(list1).intersection(list2)
Only returns exact matches
for i in xrange(len(list2)):
if abs(list1[i] - list2[i]) > 3:
del list1[i]
Does not remove all unwanted values
How would I compare these two lists with unequal length and remove non-matches (within a certain variation) in the longer list?
Upvotes: 3
Views: 4373
Reputation: 20695
Here is a solution that takes linear time; the others take quadratic time, though that may be just fine if your input is small.
def align(shorter, longer, margin=3):
result = []
removed = []
longer = enumerate(longer)
for target in shorter:
while True:
index, current = next(longer)
if abs(current - target) <= margin:
result.append(current)
break
else:
removed.append(index)
return result, removed
This assumes that you can always align the lists as in your example. If this isn't true you'll need to add some error checking to the above.
Example:
>>> align(list2, list1)
([101, 201, 502, 603, 701, 904, 10012, 10071, 10082, 10093, 10101],
[2, 3, 7, 10, 11, 12, 13, 14])
Upvotes: 1
Reputation: 984
You can use numpy array comparison:
list1 = [101, 201, 301, 402, 502, 603, 701, 802, 904, 10012, 10021, 10033, 10041, 10054, 10062, 10071, 10082, 10093, 10101]
list2 = [102, 203, 504, 601, 703, 901, 10013, 10071, 10082, 10093, 10103]
import numpy as np
l1 = np.array(list1)
l2 = np.array(list2)
ind = abs(l1 - l2[:,None]) <= 3
print l1[ind.max(0)]
print l2[ind.max(1)]
print ind.max(1)
print ind.max(0)
print np.where(~(ind.max(0)))
results in
[ 101 201 502 603 701 904 10012 10071 10082 10093 10101]
[ 102 203 504 601 703 901 10013 10071 10082 10093 10103]
[ True True True True True True True True True True True]
[ True True False False True True True False True True False False
False False False True True True True]
(array([ 2, 3, 7, 10, 11, 12, 13, 14]),)
Upvotes: 0