Reputation: 75
I am given 3 lists, and I have to find two values in the first two lists whose sum is as close as possible to each value in the third list, and I have to return their indices (one-based indexing). If multiple solutions are equally close, either one may be returned.
I have a working solution, and while it worked on semi-large inputs, it is too slow for large inputs (all 3 lists length 10000 for example).
So the question is basically: how can you find an exact solution to this problem, without having to calculate every possible combination of list1 and list2?
Sample input:
3
2 2 5
1.000002 0.000002
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
2 2 5
1.000002 0.000001
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
5 4 7
0.000001 0.000002 0.000003 0.000004 0.000005
0.000002 0.000010 0.000001 -0.000001
0.000001 0.000002 0.000100 0.000005 0.000020 0.000010 0.000003
Sample output (added newlines for readability, so not present in script output):
2 1
2 1
2 1
2 1
2 2
2 1
1 2
1 2
1 2
2 2
2 4
3 4
5 2
4 3
5 2
1 2
4 4
My current solution:
"""Given an input file, which contains multiple sets of 3 lists each,
find two values in list1 and list2 whose sum is as close as possible to each
element in list3"""
from sys import argv
from time import time
start = time()
def parser(file):
"""Reads a file, returns it as a list of reads, where each read contains
an info line, list1, list2, list3"""
lines = open(argv[1], 'r').readlines()
read = []
tests = int(lines.pop(0))
for line in lines:
read.append(line.strip())
reads = []
for n in range(tests):
reads.append(read[4 * n:4*(n+1)])
return reads
def dict_of_sums(list1, list2):
"""Creates a dict, whose keys are the sums of all values in list1 and
list2, and whose values are the indices of those values in list1 and
list2"""
sums = {}
m = len(list1)
k = len(list2)
for a in range(m):
for b in range(k):
combination = str(a + 1) + ' ' + str(b + 1)
sum = float(list1[a]) + float(list2[b])
sum = round(sum, 6)
sums[sum] = combination
return sums
def find_best_combination(ordered, list3, c):
"""Finds the best combination using binary search: takes a number c,
and searches through the ordered list to find the closest sum.
Returns that sum"""
num = float(list3[c])
lower, upper = 0, len(ordered)
while True:
idx = (lower + upper) // 2
value = ordered[idx]
if value == num:
return value
if value > num:
upper = idx
elif value < num:
lower = idx
if lower + 1 == upper:
for z in [-1, 0, 1]:
totest = idx + z
if z == -1:
delta = (ordered[totest] - num) ** 2
best = totest
else:
deltanew = (ordered[totest] - num) ** 2
if deltanew < delta:
delta = deltanew
best = totest
return ordered[best]
reads = parser(argv[1])
for i in reads:
m, k, n = i.pop(0).split()
m, k, n = int(m), int(k), int(n)
list1, list2, list3 = i[0].split(), i[1].split(), i[2].split()
results = dict_of_sums(list1, list2)
ordered = []
# Create an ordered list of all possible sums of the values in list1 and
# list2
for k in results.keys():
ordered.append(k)
ordered = sorted(ordered)
# Loops over list3, searching the closest sum. Prints the indices of its
# constituent numbers in list1 and list2
for c in range(n):
res = find_best_combination(ordered, list3, c)
results[res]
end = time()
print(end - start)
Upvotes: 0
Views: 292
Reputation: 46455
Your current solution is O(n^2 log(n))
time, and O(n^2)
memory. The reason is that your ordered
is a list of size n^2
that you then sort, and do lots and lots of binary searches on. This gives you much poorer constants, and a chance of going into swap.
In you case of 10,000 each, you have a dictionary with 100,000,000 keys, that you then sort, and walk through. Which is billions of operations and GB of data. If your machine winds up in swap, those operations will slow down a lot and you have a problem.
I would suggest that you sort lists 2 and 3. For each l1
in list 1 it lets you walk through l1+l2
in parallel with walking through l3
, finding the best in l3. Here that is in pseudo-code:
record best for every element in list 3 to be list1[1] + list2[1]
foreach l1 in list 1:
start l2 at start of list2
start l3 at start of list3
while can advance in both list 2 and list 3:
if advancing in list2 improves l1 + l2 as approx of l3:
advance in list 2
else:
if l1 + l2 is better than best recorded approx of l3:
record l1 + l2 as best for l3
advance in list 3
while can advance in list 3:
if l1 + l2 is better than best recorded approx of l3:
record l1 + l2 as best for l3
advance in list 3
if l1 + l2 is better than best recorded approx of l3:
record l1 + l2 as best for l3
This requires sorted versions of list2 and list3, and a lookup from list3 to best approximation. In your example of 10,000 items each, you have 2 data structures of size 10,000, and have to do roughly 200,000,000 operations. Better than billions and no problems with pressing against memory limits.
Upvotes: 2