user2505961
user2505961

Reputation: 150

Filter rows in numpy array based on second array

I have 2 2d numpy arrays A and B I want to remove all the rows in A which appear in B.

I tried something like this:

A[~np.isin(A, B)]

but isin keeps the dimensions of A, I need one boolean value per row to filter it.

EDIT: something like this

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

.....

A = np.array([[3, 0, 4],
              [0, 5, 9]])

Upvotes: 3

Views: 567

Answers (3)

user2505961
user2505961

Reputation: 150

This is certainly not the most performant solution but it is relatively easy to read:

A = np.array([row for row in A if row not in B])

Edit:

I found that the code does not correctly work, but this does:

A = [row for row in A if not any(np.equal(B, row).all(1))]

Upvotes: 0

Mad Physicist
Mad Physicist

Reputation: 114230

Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:

Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()

Now you can apply np.isin directly:

>>> np.isin(Av, Bv)
array([False,  True, False])

According to the docs, invert=True is faster than negating the output of isin, so you can do

A[np.isin(Av, Bv, invert=True)]

Upvotes: 2

peru_45
peru_45

Reputation: 310

Try the following - it uses matrix multiplication for dimensionality reduction:

import numpy as np

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])

Output:

[[3 0 4]
 [0 5 9]]

Upvotes: 1

Related Questions