daniel451
daniel451

Reputation: 10992

Membership checking in Numpy ndarray

I have written a script that evaluates if some entry of arr is in check_elements. My approach does not compare single entries, but whole vectors inside of arr. Thus, the script checks if [8, 3], [4, 5], ... is in check_elements.

Here's an example:

import numpy as np

# arr.shape -> (2, 3, 2)
arr = np.array([[[8,  3],
                 [4,  5],
                 [6,  2]],

                [[9,  0],
                 [1, 10],
                 [7, 11]]])

# check_elements.shape -> (3, 2)
# generally: (n, 2)
check_elements = np.array([[4, 5], [9, 0], [7, 11]])

# rslt.shape -> (2, 3)
rslt = np.zeros((arr.shape[0], arr.shape[1]), dtype=np.bool)

for i, j in np.ndindex((arr.shape[0], arr.shape[1])):
    if arr[i, j] in check_elements:   # <-- condition is checked against
                                      #     the whole last dimension
        rslt[i, j] = True
    else:
        rslt[i, j] = False

Now:

print(rslt)

...would print:

[[False  True False]
 [ True False  True]]

For getting the indices of I use:

print(np.transpose(np.nonzero(rslt)))

...which prints the following:

[[0 1]    # arr[0, 1] -> [4, 5] -> is in check_elements
 [1 0]    # arr[1, 0] -> [9, 0] -> is in check_elements
 [1 2]]   # arr[1, 2] -> [7, 11] -> is in check_elements

This task would be easy and performant if I would check a condition on single values, like arr > 3 or np.where(...), but I am not interested in single values. I want to check a condition against the whole last dimension (or slices of it).

My question is: is there a faster way to achieve the same result? Am I right that vectorized attempts and things like np.where can not be used for my problem, because they always operate on single values and not on a whole dimension or slices of that dimension?

Upvotes: 2

Views: 423

Answers (3)

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

The numpy_indexed package (disclaimer: I am its author) contains functionality to perform these kind of queries; specifically, containment relations for nd (sub)arrays:

import numpy_indexed as npi
flatidx = npi.indices(arr.reshape(-1, 2), check_elements)
idx = np.unravel_index(flatidx, arr.shape[:-1])

Note that the implementation is fully vectorized under the hood.

Also, note that with this approach, the order of the indices in idx match with the order of check_elements; the first item in idx are the row and col of the first item in check_elements. This information is lost when using an approach along the lines you posted above, or when using one of the alternative suggested answers, which will give you the idx sorted by their order of appearance in arr instead, which is often undesirable.

Upvotes: 2

jotasi
jotasi

Reputation: 5177

You can use np.in1d even though it is meant for 1D arrays by giving it a 1D view of your array, containing one element per last axis:

arr_view = arr.view((np.void, arr.dtype.itemsize*arr.shape[-1])).ravel()
check_view = check_elements.view((np.void,
        check_elements.dtype.itemsize*check_elements.shape[-1])).ravel()

will give you two 1D arrays, which contain a void type version of you 2 element arrays along the last axis. Now you can check, which of the elements in arr is also in check_view by doing:

flatResult = np.in1d(arr_view, check_view)

This will give a flattened array, which you can then reshape to the shape of arr, dropping the last axis:

print(flatResult.reshape(arr.shape[:-1]))

which will give you the desired result:

array([[False,  True, False],
       [ True, False,  True]], dtype=bool)

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107287

Here is a Numpythonic approach using broadcasting:

>>> (check_elements == arr[:,:,None]).reshape(2, 3, 6).any(axis=2)
array([[False,  True, False],
       [ True, False,  True]], dtype=bool)

Upvotes: 2

Related Questions