renardo
renardo

Reputation: 87

deleting rows based on value found in specififc column

I am attempting to write a code that searches a numpy array for cases where the value in the fifth column does not have 50. If it does not I wish to remove it.

This is what I have so far:

for rows in range(len(b)):
    if b[:,4].any() != 50:
        b = np.delete(b, b[rows])

However, I keep getting the following error:

too many indices for array

Upvotes: 2

Views: 69

Answers (2)

hpaulj
hpaulj

Reputation: 231738

Lets run the calculation with some diagnositic prints. Note where the error occurs. That's important! (We shouldn't just keep trying things without isolating the problem!)

In [2]: b=np.array([[0,1,2],[1,2,3],[2,1,2]])
In [3]: for row in range(len(b)):
   ...:     print(row)
   ...:     if b[:,2].any() !=2:
   ...:         print(b[row])
   ...:         b = np.delete(b, b[row])
   ...:         
0
[0 1 2]
1
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-04dc188d9a2b> in <module>()
      1 for row in range(len(b)):
      2     print(row)
----> 3     if b[:,2].any() !=2:
      4         print(b[row])
      5         b = np.delete(b, b[row])
IndexError: too many indices for array

So the error occurs on the 2nd iteration (row 1). Something is wrong with the b after the delete. What is the new value of b?

In [4]: b
Out[4]: array([1, 2, 3, 2, 1, 2])

b is a 1d array, not the 2d we started with. That explains the error, right? Something must be wrong with the use of delete. Maybe we need to check its documentation????

Look at the axis parameter:

axis : int, optional
  The axis along which to delete the subarray defined by `obj`.
  If `axis` is None, `obj` is applied to the flattened array.

We didn't specify an axis, so the delete was applied to the flattened array, and result was flattened - 1d.

But even if I specify an axis I get an error (I won't get into that), which prompts me to look more carefully at the if condition:

In [10]: b[:,2]
Out[10]: array([2, 3, 2])
In [11]: b[:,2].any()
Out[11]: True
In [12]: b[:,2]!=2
Out[12]: array([False,  True, False])

Applying any to the column don't make sense - it just checks if any values in the column are not 0. Instead we want to test the column against the target, getting a boolean that matches the column in size.

We can use that boolean directly as row selection mask

In [13]: b[_,:]
Out[13]: array([[1, 2, 3]])

No need to iterate.

Another problem with your iteration. You iterate on the range(3), [0,1,2]. But inside the loop you try to remove a row from b, changing the size of b. That going to give problems when you try to index b[row] by number, right? When iterating, in Python or numpy, be careful about modifying the object that you are iterating over.

Sorry to be long winded about this, but it looks like you need some basic debugging guidance.


Here's a basic list approach:

In [15]: [row for row in b if row[2]!=2]
Out[15]: [array([1, 2, 3])]

I'm iterating on the rows, not their indices, and for each row checking the column value, and keeping that row if the check is True. We could do that with np.delete, but a list comprehension is clearer (and faster).

Upvotes: 1

Cleb
Cleb

Reputation: 26069

It would be better to provide b and desired output, but if i understand it correctly, you could use:

import numpy as np

b = np.array([[50, 2, 3, 4, 5, 6],
              [4, 50, 6, 7, 8, 9],
              [1, 1, 1, 1, 50, 9]])


array([[50,  2,  3,  4,  5,  6],
       [ 4, 50,  6,  7,  8,  9],
       [ 1,  1,  1,  1, 50,  9]])

Then you can check which rows contain 50 in the 5th column using

b[:, 4] == 50
array([False, False,  True])

and feed this Boolean array back to b to select the desired columns:

b[b[:, 4] == 50]

which leaves you with one row in this case

array([[ 1,  1,  1,  1, 50,  9]])

Upvotes: 0

Related Questions