Joe Casey
Joe Casey

Reputation: 53

Indices of intersection between arrays

Is there a fast way to compare every element of an array against every element in a list of unique identifiers?

Using a for loop to loop through each of the unique values works but is way too slow to be usable. I have been searching for a vectorized solution but have not been successful. Any help would be greatly appreciated!

arrStart = []
startRavel = startInforce['pol_id'].ravel()
for policy in unique_policies:
    arrStart.append(np.argwhere(startRavel == policy))

Sample Input:

startRavel = [1,2,2,2,3,3]

unique_policies = [1,2,3]

Sample Output:

arrStart = [[0], [1,2,3],[4,5]]

The new array would have the same length as the unique values array but each element would be a list of all of the rows that match that unique value in the large array.

Upvotes: 3

Views: 138

Answers (1)

yatu
yatu

Reputation: 88226

Here's a vectorized solution:

import numpy as np
startRavel = np.array([1,2,2,2,3,3])
unique_policies = np.array([1,2,3])

Sort startRavel using np.argsort.

ix = np.argsort(startRavel)
s_startRavel = startRavel[ix]

Use np.searchsorted to find the indices in which unique_policies should be inserted in startRavel to mantain order:

s_ix = np.searchsorted(s_startRavel, unique_policies)
# array([0, 1, 4])

And then use np.split to split the array using the obtained indices. np.argsort is used again on s_ix to deal with non-sorted inputs:

ix_r = np.argsort(s_ix)
ixs = np.split(ix, s_ix[ix_r][1:])
np.array(ixs)[ix_r]
# [array([0]), array([1, 2, 3]), array([4, 5])]

General solution :

Lets wrap it all up in a function:

def ix_intersection(x, y):
    """
    Finds the indices where each unique
    value in x is found in y.
    Both x and y must be numpy arrays.
    ----------
    x: np.array
       Must contain unique values. 
       Values in x are assumed to be in y.
    y: np.array

    Returns
    -------
    Array of arrays. Each array contains the indices where a
    value in x is found in y
    """
    ix_y = np.argsort(y)
    s = np.searchsorted(y[ix_y], x)
    ix_r = np.argsort(s)
    ixs = np.split(ix_y, s[ix_r][1:])
    return np.array(ixs)[ix_r]

Other examples

Lets try with the following arrays:

startRavel = np.array([1,3,3,2,2,2])
unique_policies = np.array([1,2,3])

ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([1, 2])])

Another example, this time with non-sorted inputs:

startRavel = np.array([1,3,3,2,2,2,5])
unique_policies = np.array([1,2,5,3])

ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([6]), array([1, 2])])

Upvotes: 4

Related Questions