jolly_cranberry
jolly_cranberry

Reputation: 133

Fastest way to look for values in an array that are also in another array

I have two arrays, arrayA (3D) that I need to analyse and arrayB that stores values of interest (2D). I want to to return a third array (same dimensions as the original arrayA) that has 1s where a value is one of the values of interest (in arrayB) or 0s if not.

This is the code that I am using:

arrayC = np.zeros((arrayA.shape[0], arrayA.shape[1], arrayA.shape[2]))
for k in range(arrayA.shape[2]):
    for i in range(arrayA.shape[0]):
        for j in range(arrayA.shape[1]):
            if arrayA[i][j][k] in arrayB[k]:
                arrayC[i][j][k] = 1 

This takes forever (several minutes) for arrayA 1000x1000x10 and I need to find a way to make it way faster. I know that I can speed it up by working on a flattened array which I've implemented (I've left the code here as above just to make it clear what is happening) and then again reshaping in the end, but I am looking for a further improvement.

(I've also tried with np.where, but I cannot make it work with a range condition)

Upvotes: 1

Views: 84

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148910

You can obtain the expected result in a vectorized way with np.isin and np.where:

arrayC = np.where([np.isin(arrayA[i], arrayB[i])
                   for i in range(len(arrayB))], 1, 0)

Upvotes: 1

Mephy
Mephy

Reputation: 2986

You can use numpy.isin:

c = np.zeros_like(a)
for k in range(a.shape[2]):
    c[:, :, k] = np.isin(a[:, :, k], b[k])

In a rough test on my machine, for a with shape (1000, 1000, 10) and b with shape (1000, 500) this takes 1.55 seconds, while your version takes 50.3 seconds.

Upvotes: 1

Related Questions