Given 3 lists, find which two elements in the first two lists sum as close as possible to each value in the third list

Question

I am given 3 lists, and I have to find two values in the first two lists whose sum is as close as possible to each value in the third list, and I have to return their indices (one-based indexing). If multiple solutions are equally close, either one may be returned.

I have a working solution, and while it worked on semi-large inputs, it is too slow for large inputs (all 3 lists length 10000 for example).

So the question is basically: how can you find an exact solution to this problem, without having to calculate every possible combination of list1 and list2?

Sample input:

3
2 2 5
1.000002 0.000002
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
2 2 5
1.000002 0.000001
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
5 4 7
0.000001 0.000002 0.000003 0.000004 0.000005
0.000002 0.000010 0.000001 -0.000001
0.000001 0.000002 0.000100 0.000005 0.000020 0.000010 0.000003

Sample output (added newlines for readability, so not present in script output):

My current solution:

"""Given an input file, which contains multiple sets of 3 lists each,
find two values in list1 and list2 whose sum is as close as possible to each
element in list3"""

from sys import argv
from time import time
start = time()

def parser(file):
    """Reads a file, returns it as a list of reads, where each read contains
    an info line, list1, list2, list3"""
    lines = open(argv[1], 'r').readlines()
    read = []
    tests = int(lines.pop(0))
    for line in lines:
        read.append(line.strip())
    reads = []
    for n in range(tests):
        reads.append(read[4 * n:4*(n+1)])
    return reads

def dict_of_sums(list1, list2):
    """Creates a dict, whose keys are the sums of all values in list1 and
    list2, and whose values are the indices of those values in list1 and
    list2"""
    sums = {}
    m = len(list1)
    k = len(list2)
    for a in range(m):
        for b in range(k):
            combination = str(a + 1) + ' ' + str(b + 1)
            sum = float(list1[a]) + float(list2[b])
            sum = round(sum, 6)
            sums[sum] = combination
    return sums

def find_best_combination(ordered, list3, c):
    """Finds the best combination using binary search: takes a number c,
    and searches through the ordered list to find the closest sum.
    Returns that sum"""
    num = float(list3[c])
    lower, upper = 0, len(ordered)
    while True:
        idx = (lower + upper) // 2
        value = ordered[idx]

        if value == num:
            return value
        if value > num:
            upper = idx
        elif value < num:
            lower = idx

        if lower + 1 == upper:
            for z in [-1, 0, 1]:
                totest = idx + z
                if z == -1:
                    delta = (ordered[totest] - num) ** 2
                    best = totest
                else:
                    deltanew = (ordered[totest] - num) ** 2
                    if deltanew < delta:
                        delta = deltanew
                        best = totest

            return ordered[best]

reads = parser(argv[1])
for i in reads:
    m, k, n = i.pop(0).split()
    m, k, n = int(m), int(k), int(n)
    list1, list2, list3 = i[0].split(), i[1].split(), i[2].split()
    results = dict_of_sums(list1, list2)
    ordered = []
    # Create an ordered list of all possible sums of the values in list1 and
    # list2
    for k in results.keys():
        ordered.append(k)
    ordered = sorted(ordered)
    # Loops over list3, searching the closest sum. Prints the indices of its
    # constituent numbers in list1 and list2
    for c in range(n):
        res = find_best_combination(ordered, list3, c)
        results[res]

end = time()
print(end - start)

btilly · Accepted Answer

Your current solution is O(n^2 log(n)) time, and O(n^2) memory. The reason is that your ordered is a list of size n^2 that you then sort, and do lots and lots of binary searches on. This gives you much poorer constants, and a chance of going into swap.

In you case of 10,000 each, you have a dictionary with 100,000,000 keys, that you then sort, and walk through. Which is billions of operations and GB of data. If your machine winds up in swap, those operations will slow down a lot and you have a problem.

I would suggest that you sort lists 2 and 3. For each l1 in list 1 it lets you walk through l1+l2 in parallel with walking through l3, finding the best in l3. Here that is in pseudo-code:

record best for every element in list 3 to be list1[1] + list2[1]
foreach l1 in list 1:
    start l2 at start of list2
    start l3 at start of list3
    while can advance in both list 2 and list 3:
        if advancing in list2 improves l1 + l2 as approx of l3:
            advance in list 2
        else:
            if l1 + l2 is better than best recorded approx of l3:
                record l1 + l2 as best for l3
            advance in list 3
    while can advance in list 3:
        if l1 + l2 is better than best recorded approx of l3:
            record l1 + l2 as best for l3
        advance in list 3
    if l1 + l2 is better than best recorded approx of l3:
        record l1 + l2 as best for l3

This requires sorted versions of list2 and list3, and a lookup from list3 to best approximation. In your example of 10,000 items each, you have 2 data structures of size 10,000, and have to do roughly 200,000,000 operations. Better than billions and no problems with pressing against memory limits.

Given 3 lists, find which two elements in the first two lists sum as close as possible to each value in the third list

Answers (1)

Related Questions