Reputation: 150
I have 2 2d numpy arrays A and B I want to remove all the rows in A which appear in B.
I tried something like this:
A[~np.isin(A, B)]
but isin keeps the dimensions of A, I need one boolean value per row to filter it.
EDIT: something like this
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
.....
A = np.array([[3, 0, 4],
[0, 5, 9]])
Upvotes: 3
Views: 567
Reputation: 150
This is certainly not the most performant solution but it is relatively easy to read:
A = np.array([row for row in A if row not in B])
Edit:
I found that the code does not correctly work, but this does:
A = [row for row in A if not any(np.equal(B, row).all(1))]
Upvotes: 0
Reputation: 114230
Probably not the most performant solution, but does exactly what you want. You can change the dtype of A
and B
to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray
:
Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()
Now you can apply np.isin
directly:
>>> np.isin(Av, Bv)
array([False, True, False])
According to the docs, invert=True
is faster than negating the output of isin
, so you can do
A[np.isin(Av, Bv, invert=True)]
Upvotes: 2
Reputation: 310
Try the following - it uses matrix multiplication for dimensionality reduction:
import numpy as np
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])
Output:
[[3 0 4]
[0 5 9]]
Upvotes: 1