A fast way to make a list of lists of differences from two original lists

Question

I have two lists. Both are sorted lists of numbers. Say:

A = [1.1, 5.2, 12.3, 12.6]
B = [2.3, 2.7, 5.2, 11.1, 12.1, 15.6, 16.6]

I would like to output in this case:

result = [[2.3], [0.4, 2.5], [5.9, 1]]

and an extra list:

remainder = [3.5, 1]

Where does this come from?

First consider the list of differences between consecutive values in B with an implicit zero added to the start.

[2.3, 0.4, 2.5, 5.9, 1, 3.5, 1]

We need to split this up according to where each value in A was closest to in B.

For each number in A, the nearest value in B is:

1.1 -> 2.3 -> 2.3
5.2 -> 5.2 -> 2.5
12.3 -> 12.1 -> 1
12.6 -> 12.1 -> 1

The rest goes into the remainder variable.

I am looking for a fast (linear time) way to do this in python. Any help very much appreciated. I don't mind if it uses numpy or not, whichever is faster.

My attempts:

I tried to solve this but via a convoluted route. First I make a mapping using:

def find_nearest(array, value):
    idx = np.searchsorted(array, value, transformed_remainderside="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

Then I use that to make:

[[2.3], [2.7, 5.2], [11.1, 12.1]] and [15.6, 16.6]

Then I make:

[[2.3], [0.4, 2.9], [5.9, 6.9]] and [3.5, 4.5]

Then finally I make [[2.3], [0.4, 2.5], [5.9, 1]] and [3.5, 1]

This is painful and bug prone and also not linear time overall.

Added example

A = [2.3, 2.7, 5.2, 11.1]
B = [2.3, 2.7, 5.2, 11.1]

result = [[2.3], [0.4], [2.5], [5.9]]
remainder = []

Divakar · Accepted Answer

Here's one based on [np.searchsorted] -

# https://stackoverflow.com/a/45350318/ Variant for already sorted B
def closest_argmin_sortedB(A, sorted_B):
    L = len(sorted_B)
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sorted_idx-mask

A = np.asarray(A)
B = np.asarray(B)
d = np.ediff1d(B,to_begin=B[0])
idx = closest_argmin_sortedB(A,B)
idxf = idx[np.r_[True,idx[:-1]!=idx[1:]]]

p = np.split(d,idxf+1)
res,remainder = p[:-1],p[-1]

On bigger lists, to achieve performance boost, we can use zipping to slice and thus split the array/list data. Hence, the last two steps could be replaced by -

s = np.r_[0,idxf+1,len(d)]
res,remainder = [d[i:j] for (i,j) in zip(s[:-2],s[1:-1])], d[s[-2]:s[-1]]

A fast way to make a list of lists of differences from two original lists

Where does this come from?

Added example

Answers (2)

Related Questions