Reputation: 6187
I have a very large 2d Numpy array (a few columns but billions of rows). As the program runs, I get more of those, thousands of them are generated.
For each one, I'd like to remove all rows that contains certain values in certain positions. For example, if I had:
arr = np.array([
[10, 1, 1, 1],
[1, 2, 1, 2],
[1, 2, 1, 2],
[3, 1, 1, 1],
[2, 2, 1, 2]
[3, 4, 2, 7],
[3, 2, 1, 9],
[3, 2, 2, 2],
]),
I'd like to remove all rows that contain the value 2 on positions 1 and 3, so that I would end up with:
print(arr)
>>> ([
[10, 1, 1, 1],
[3, 1, 1, 2],
[3, 4, 2, 7],
[3, 2, 1, 9],
]),
Because I have such large 2d arrays and so many of them, I'm trying to do this with a Numpy call so that it runs in C, instead of iterating and selecting rows in Python which is much, much slower.
Is there a Numpy way of accomplishing this?
Thanks!
Eduardo
Upvotes: 0
Views: 125
Reputation: 214927
You can use boolean array indexing: i.e. select the 2nd and 4th column and then check that not all of them are equal to 2:
arr[(arr[:, [1,3]] != 2).any(1)]
array([[10, 1, 1, 1],
[ 3, 1, 1, 1],
[ 3, 4, 2, 7],
[ 3, 2, 1, 9]])
Upvotes: 1