Simd
Simd

Reputation: 21174

A fast way to make a list of lists of differences from two original lists

I have two lists. Both are sorted lists of numbers. Say:

A = [1.1, 5.2, 12.3, 12.6]
B = [2.3, 2.7, 5.2, 11.1, 12.1, 15.6, 16.6]

I would like to output in this case:

result = [[2.3], [0.4, 2.5], [5.9, 1]]

and an extra list:

remainder = [3.5, 1]

Where does this come from?

First consider the list of differences between consecutive values in B with an implicit zero added to the start.

[2.3, 0.4, 2.5, 5.9, 1, 3.5, 1]

We need to split this up according to where each value in A was closest to in B.

For each number in A, the nearest value in B is:

1.1 -> 2.3 -> 2.3
5.2 -> 5.2 -> 2.5
12.3 -> 12.1 -> 1
12.6 -> 12.1 -> 1

The rest goes into the remainder variable.

I am looking for a fast (linear time) way to do this in python. Any help very much appreciated. I don't mind if it uses numpy or not, whichever is faster.


My attempts:

I tried to solve this but via a convoluted route. First I make a mapping using:

def find_nearest(array, value):
    idx = np.searchsorted(array, value, transformed_remainderside="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

Then I use that to make:

[[2.3], [2.7, 5.2], [11.1, 12.1]] and [15.6, 16.6]

Then I make:

[[2.3], [0.4, 2.9], [5.9, 6.9]] and [3.5, 4.5]

Then finally I make [[2.3], [0.4, 2.5], [5.9, 1]] and [3.5, 1]

This is painful and bug prone and also not linear time overall.


Added example

A = [2.3, 2.7, 5.2, 11.1]
B = [2.3, 2.7, 5.2, 11.1]

result = [[2.3], [0.4], [2.5], [5.9]]
remainder = []

Upvotes: 2

Views: 160

Answers (2)

Elisha
Elisha

Reputation: 23770

This can be done in a very explicit way by splitting the task into two parts: matching the closest number and building the ranges.

Firstly, the code goes through both arrays linearly and chooses the closest number in B for each number in A. Then, the code transforms the structure to the required output of ranges of adjacent numbers and filters out the ranges without any match:

import numpy as np

A = [1.1, 5.2, 12.3, 12.6]
B = [2.3, 2.7, 5.2, 11.1, 12.1, 15.6, 16.6]

# This array will hold the closest numbers in A for each number in B
matches = [[] for _ in B]

i = 0
for num in A:
    # Check if the current number in B is the closest to the current one
    # This assumes both arrays are sorted
    while i < len(B) - 1 and abs(num - B[i]) > abs(num - B[i + 1]):
        i += 1
    matches[i].append(num)

# Unite the pairs so each range has a list of matching numbers
matches = [[matches[0]]] + [l1+l2 for l1, l2 in zip(matches[1::2], matches[2::2])]

# Create a list of diffs and pair them into ranges
diffs = (np.array(B[1:]) - np.array(B[:-1])).tolist()
ranges = [[B[0]]] + list(map(list, zip(diffs[::2], diffs[1::2])))

# Output only the ranges that had at least a single match in A
ranges_with_numbers = [num_range for num_range, range_matches in zip(ranges, matches)
                       if len(range_matches) > 0]
remainder = [num_range for num_range, range_matches in zip(ranges, matches)
             if len(range_matches) == 0]

The complexity is O(n) since the matching phase scans each array just once, and so is the transform phase.

Upvotes: 1

Divakar
Divakar

Reputation: 221534

Here's one based on [np.searchsorted] -

# https://stackoverflow.com/a/45350318/ Variant for already sorted B
def closest_argmin_sortedB(A, sorted_B):
    L = len(sorted_B)
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sorted_idx-mask

A = np.asarray(A)
B = np.asarray(B)
d = np.ediff1d(B,to_begin=B[0])
idx = closest_argmin_sortedB(A,B)
idxf = idx[np.r_[True,idx[:-1]!=idx[1:]]]

p = np.split(d,idxf+1)
res,remainder = p[:-1],p[-1]

On bigger lists, to achieve performance boost, we can use zipping to slice and thus split the array/list data. Hence, the last two steps could be replaced by -

s = np.r_[0,idxf+1,len(d)]
res,remainder = [d[i:j] for (i,j) in zip(s[:-2],s[1:-1])], d[s[-2]:s[-1]]

Upvotes: 1

Related Questions