Now to remove certain rows from 2d Numpy array when they match a given critera?

Question

I have a very large 2d Numpy array (a few columns but billions of rows). As the program runs, I get more of those, thousands of them are generated.

For each one, I'd like to remove all rows that contains certain values in certain positions. For example, if I had:

arr = np.array([
    [10, 1, 1, 1],
    [1, 2, 1, 2],
    [1, 2, 1, 2],
    [3, 1, 1, 1],
    [2, 2, 1, 2]
    [3, 4, 2, 7],
    [3, 2, 1, 9],
    [3, 2, 2, 2],
]),

I'd like to remove all rows that contain the value 2 on positions 1 and 3, so that I would end up with:

print(arr)
>>> ([
    [10, 1, 1, 1],
    [3, 1, 1, 2],
    [3, 4, 2, 7],
    [3, 2, 1, 9],
]),

Because I have such large 2d arrays and so many of them, I'm trying to do this with a Numpy call so that it runs in C, instead of iterating and selecting rows in Python which is much, much slower.

Is there a Numpy way of accomplishing this?

Thanks!

Eduardo

akuiper · Accepted Answer

You can use boolean array indexing: i.e. select the 2nd and 4th column and then check that not all of them are equal to 2:

arr[(arr[:, [1,3]] != 2).any(1)]
array([[10,  1,  1,  1],
       [ 3,  1,  1,  1],
       [ 3,  4,  2,  7],
       [ 3,  2,  1,  9]])

Now to remove certain rows from 2d Numpy array when they match a given critera?

Answers (1)

Related Questions