A. W.
A. W.

Reputation: 75

Minimum summation of difference of terms in two lists

Let's say I have two python lists like so:

[30, 400, 500]

[55, 396, 478]

I want to find the sum of minimum (absolute value of the) difference between the elements. In this case it would be easy: (55-30) + (400-396) + (500-478) = 51

But how would I go about doing this efficiently when the lists don't have an equal number of elements. For example:

Set 1:

list1 = [30, 400, 500]

list2 = [412, 489]

or even if it was

Set 2

list1 = [30, 400, 500]

list2 = [24, 563]

lastly,

Set 3

list1 = [30, 50]

list2 = [20, 31, 90]

For Set 1, the answer would be (412-400) + (500-489) = 23

For Set 2, the answer would be (30-24) + (563-500) = 69

For Set 3, the answer would be (30-20) + (50-31) =29

I can't compare by element. In set 1, the sum of the minimum difference is achieved by comparing the second element of list1 to the first element of list2, and the third element of list1 to the second element of list2. In set 2, the sum of the minimum difference is achieved by comparing the first element of list1 to the first element of list2, and the third element of list1 to the second element of list2.

Any help is appreciated.

Some other info:

Upvotes: 2

Views: 2014

Answers (6)

Seyi Shoboyejo
Seyi Shoboyejo

Reputation: 529

Okay before jumping into coding this is how I would reason about the problem: 1. Simply calculate all the possible values. 2. Just take out the minimum I don't think anything more complex will be more efficient because, eventually, you still have to test all the combinations to have full certainty. With this in mind I will do:

ll1, ll2 = len(l1), len(l2) 
if ll2 < ll1:
    l1, l2, ll1, ll2 = l2, l1, ll2, ll1
# Now any longer list will be l2 and ll2 >= ll1

At this stage we need a function to be able to split a single list into a list of lists where each child list (that is item) has the length given by the number specified. They also cannot contain the same item (from the split list) twice. Enter itertools.

from itertools import combinations, permutations 
# All the lists within l2 that will be mixed with l1 (that is they have same length as l1) :
l2_sublists = combinations(l2, ll1) 
mixes = [l1 + item for item in l2_sublists] 

To get all the sums of differences for each mix we find all the combinations; partition them in twos; then for each combination sum the absolute values of the difference of the items in each partition...

diffs = (sum(abs(p[0] - p[1]) for p in (perm[i:i + 2] for i in range(0, len(perm), 2))) for m in mixes for perm in permutations(m, 2 * ll1)) 
result = min(diffs) 
print(result)

Upvotes: 1

Solaxun
Solaxun

Reputation: 2792

If i've understood this correctly, I believe the following should work:

list1 = [30, 400, 500]
list2 = [412, 489]

diffs = []
pairs = []
for l2 in list2:
    min_diff = float('inf')
    pair     = None
    for l1 in list1:
        abs_diff = abs(l2-l1)
        if abs_diff < min_diff:
            min_diff = abs_diff
            pair = (l1,l2)
    diffs.append(min_diff)
    pairs.append(pair)

print(diffs)
print(sum(diffs))
print(pairs)

A mistake was pointed out in the comments, here is the revised version.

import itertools
def min_abs_diff(l1,l2):
    bigger, smaller = sorted([l1,l2],key=len,reverse=True)
    diffs = [abs(x-y) for x,y in itertools.product(bigger,smaller)]
    return sum(min(diffs[i*len(bigger):(i+1)*len(bigger)]) 
               for i in range(len(diffs)//len(bigger)))

Upvotes: 0

musale
musale

Reputation: 584

Use sorted and zip.

>>> list1 = [30, 400, 500]
>>> list2 = [412, 489]
>>> l3 = zip(sorted(list1), sorted(list2))
>>> s = 0
>>> for i in l3:
...   s += abs(i[0] - i[1])
...
>>> s
23

If you need to still use the "hanging" values in the list you can use zip_longest with fillvalue being the default value to pair the hanging values. Then with sorted, you can add reverse=True to change list to descending order.

Edit

With the added info, removing reverse=True pretty much does it.

Upvotes: 0

kuppern87
kuppern87

Reputation: 1135

In order to be sure to get the right answer I would use a bipartite weighted matching, where the abs-diff between each pair are the weights. This will avoid all the pitfalls from sorting based approaches, such as

list1=[30, 50], list2=[20, 31, 90], ans= 29

where most intuitve algorithms would pair 30 with 31. (giving sum 41)

Here is a solution using scipy's linear_sum_assignment:

import numpy as np
from scipy.optimize import linear_sum_assignment
def min_diff_sum(list1, list2):
    arr1 = np.asanyarray(list1)
    arr2 = np.asanyarray(list2)
    cost_matrix = np.abs(arr1-arr2[:, None])
    pairs = linear_sum_assignment(cost_matrix)
    return np.sum(cost_matrix[pairs])

This should always give the right result.

In [45]: min_diff_sum([30, 400, 500], [412, 489])
Out[45]: 23

In [46]: min_diff_sum([30, 400, 500], [24, 563])
Out[46]: 69

Upvotes: 4

Shivam Yadav
Shivam Yadav

Reputation: 26

One way to do this problem is to choose the smaller list first. Take numbers one by one from smaller list and search for the minimum absolute difference (keep track of index as well) and once you found the minimum absolute difference add it to your final sum and delete that element from bigger list so you won't consider that again.

This solution is O(NM). Assuming list size constraint are N, M for list1 and list2 respectively. You can optimise the solution to O(NLogN + NLogM) it by sorting the bigger list in O(NLogN) and using binary search to find the minimum absolute difference.

Upvotes: 1

nosklo
nosklo

Reputation: 222852

you could use the bisect module:

import bisect

list1 = [30, 400, 500]
list2 = [412, 489]


list1.sort() # list1 must be sorted

result = []

for el in sorted(list2): # walk through the elements in sorted order
    pos = bisect.bisect_left(list1, el) # find the closest elements
    if pos >= len(list1): # el is bigger than last element, use it
        pos -= 1
    elif pos > 0 and abs(list1[pos-1] - el) <= abs(list1[pos] - el):
        pos = pos - 1
    result.append(abs(list1[pos] - el))
    del list1[pos]

print(result)

results in [12, 11] (which is [412-400, 500-489])

If you use list2 = [24, 563] then you get [6, 63] (which is [30-24, 563-500])

Upvotes: 1

Related Questions