Reputation: 21174
I have two lists. Both are sorted lists of numbers. Say:
A = [1.1, 5.2, 12.3, 12.6]
B = [2.3, 2.7, 5.2, 11.1, 12.1, 15.6, 16.6]
I would like to output in this case:
result = [[2.3], [0.4, 2.5], [5.9, 1]]
and an extra list:
remainder = [3.5, 1]
First consider the list of differences between consecutive values in B with an implicit zero added to the start.
[2.3, 0.4, 2.5, 5.9, 1, 3.5, 1]
We need to split this up according to where each value in A was closest to in B.
For each number in A, the nearest value in B is:
1.1 -> 2.3 -> 2.3
5.2 -> 5.2 -> 2.5
12.3 -> 12.1 -> 1
12.6 -> 12.1 -> 1
The rest goes into the remainder variable.
I am looking for a fast (linear time) way to do this in python. Any help very much appreciated. I don't mind if it uses numpy or not, whichever is faster.
My attempts:
I tried to solve this but via a convoluted route. First I make a mapping using:
def find_nearest(array, value):
idx = np.searchsorted(array, value, transformed_remainderside="left")
if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
return array[idx-1]
else:
return array[idx]
Then I use that to make:
[[2.3], [2.7, 5.2], [11.1, 12.1]] and [15.6, 16.6]
Then I make:
[[2.3], [0.4, 2.9], [5.9, 6.9]] and [3.5, 4.5]
Then finally I make [[2.3], [0.4, 2.5], [5.9, 1]] and [3.5, 1]
This is painful and bug prone and also not linear time overall.
A = [2.3, 2.7, 5.2, 11.1]
B = [2.3, 2.7, 5.2, 11.1]
result = [[2.3], [0.4], [2.5], [5.9]]
remainder = []
Upvotes: 2
Views: 160
Reputation: 23770
This can be done in a very explicit way by splitting the task into two parts: matching the closest number and building the ranges.
Firstly, the code goes through both arrays linearly and chooses the closest number in B for each number in A. Then, the code transforms the structure to the required output of ranges of adjacent numbers and filters out the ranges without any match:
import numpy as np
A = [1.1, 5.2, 12.3, 12.6]
B = [2.3, 2.7, 5.2, 11.1, 12.1, 15.6, 16.6]
# This array will hold the closest numbers in A for each number in B
matches = [[] for _ in B]
i = 0
for num in A:
# Check if the current number in B is the closest to the current one
# This assumes both arrays are sorted
while i < len(B) - 1 and abs(num - B[i]) > abs(num - B[i + 1]):
i += 1
matches[i].append(num)
# Unite the pairs so each range has a list of matching numbers
matches = [[matches[0]]] + [l1+l2 for l1, l2 in zip(matches[1::2], matches[2::2])]
# Create a list of diffs and pair them into ranges
diffs = (np.array(B[1:]) - np.array(B[:-1])).tolist()
ranges = [[B[0]]] + list(map(list, zip(diffs[::2], diffs[1::2])))
# Output only the ranges that had at least a single match in A
ranges_with_numbers = [num_range for num_range, range_matches in zip(ranges, matches)
if len(range_matches) > 0]
remainder = [num_range for num_range, range_matches in zip(ranges, matches)
if len(range_matches) == 0]
The complexity is O(n) since the matching phase scans each array just once, and so is the transform phase.
Upvotes: 1
Reputation: 221534
Here's one based on [np.searchsorted
] -
# https://stackoverflow.com/a/45350318/ Variant for already sorted B
def closest_argmin_sortedB(A, sorted_B):
L = len(sorted_B)
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sorted_idx-mask
A = np.asarray(A)
B = np.asarray(B)
d = np.ediff1d(B,to_begin=B[0])
idx = closest_argmin_sortedB(A,B)
idxf = idx[np.r_[True,idx[:-1]!=idx[1:]]]
p = np.split(d,idxf+1)
res,remainder = p[:-1],p[-1]
On bigger lists, to achieve performance boost, we can use zipping to slice and thus split the array/list data. Hence, the last two steps could be replaced by -
s = np.r_[0,idxf+1,len(d)]
res,remainder = [d[i:j] for (i,j) in zip(s[:-2],s[1:-1])], d[s[-2]:s[-1]]
Upvotes: 1