Jun
Jun

Reputation: 429

How do I remove rows from a numpy array based on multiple conditions?

I have a file with three columns and thousands of rows. I want to remove those rows, whose items in the first column are in a certain range. For example, if the data in my file is as follows:

18  6.215   0.025
19  6.203   0.025
20  6.200   0.025
21  6.205   0.025
22  6.201   0.026
23  6.197   0.026
24  6.188   0.024
25  6.187   0.023
26  6.189   0.021
27  6.188   0.020
28  6.192   0.019
29  6.185   0.020
30  6.189   0.019
31  6.191   0.018
32  6.188   0.019
33  6.187   0.019
34  6.194   0.021
35  6.192   0.024
36  6.193   0.024
37  6.187   0.026
38  6.184   0.026
39  6.183   0.027
40  6.189   0.027

I want to remove those rows, whose first item is between 20 and 25 or between 30 and 35. The expected output is thus:

18  6.215   0.025
19  6.203   0.025
26  6.189   0.021
27  6.188   0.020
28  6.192   0.019
29  6.185   0.020
36  6.193   0.024
37  6.187   0.026
38  6.184   0.026
39  6.183   0.027
40  6.189   0.027

How could I do this?

Upvotes: 11

Views: 32735

Answers (4)

Rainald62
Rainald62

Reputation: 740

In the special but frequent case that the selection criterion is whether a value hits an interval, I use the abs() of the difference to the mid of the interval, especially if midInterval has a physical meaning:

data = data[abs(data[:,0] - midInterval) < deviation] # '<' for keeping the interval

If the data type is integer and the mid value is not (as in Jun's request), you could double the values instead of conversion to float (rounding errors become > 1 for huge integers):

data = data[abs(2*data[:,0] - sumOfLimits) > deltaOfLimits]

Repeat to remove two intervals. With the limits in Jun's question:

data = data[abs(2*data[:,0] - 45) > 3]
data = data[abs(2*data[:,0] - 65) > 3]

Upvotes: 2

Roger Fan
Roger Fan

Reputation: 5045

If you want to keep using numpy, the solution isn't hard.

data = data[np.logical_not(np.logical_and(data[:,0] > 20, data[:,0] < 25))]
data = data[np.logical_not(np.logical_and(data[:,0] > 30, data[:,0] < 35))]

Or if you want to combine it all into one statement,

data = data[
    np.logical_not(np.logical_or(
        np.logical_and(data[:,0] > 20, data[:,0] < 25),
        np.logical_and(data[:,0] > 30, data[:,0] < 35)
    ))
]

To explain, conditional statements like data[:,0] < 25 create boolean arrays that track, element-by-element, where the condition in an array is true or false. In this case, it tells you where the first column of data is less than 25.

You can also index numpy arrays with these boolean arrays. A statement like data[data[:,0] > 30] extracts all the rows where data[:,0] > 30 is true, or all the rows where the first element is greater than 30. This kind of conditional indexing is how you extract the rows (or columns, or elements) that you want.

Finally, we need logical tools to combine boolean arrays element-by-element. Regular and, or, and not statements don't work because they try to combine the boolean arrays together as a whole. Fortunately, numpy provides a set of these tools for use in the form of np.logical_and, np.logical_or, and np.logical_not. With these, we can combine our boolean arrays element-wise to find rows that satisfy more complicated conditions.

Upvotes: 14

funk
funk

Reputation: 2287

Find below my solution to the problem of deletion specific rows from a numpy array. The solution is provided as one-liner of the form:

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

and is based on pure numpy functions (np.bitwise_and, np.where, np.delete).

A = np.array( [   [ 18, 6.215, 0.025 ],
    [ 19, 6.203, 0.025 ],
    [ 20, 6.200, 0.025 ],
    [ 21, 6.205, 0.025 ],
    [ 22, 6.201, 0.026 ],
    [ 23, 6.197, 0.026 ],
    [ 24, 6.188, 0.024 ],
    [ 25, 6.187, 0.023 ],
    [ 26, 6.189, 0.021 ],
    [ 27, 6.188, 0.020 ],
    [ 28, 6.192, 0.019 ],
    [ 29, 6.185, 0.020 ],
    [ 30, 6.189, 0.019 ],
    [ 31, 6.191, 0.018 ],
    [ 32, 6.188, 0.019 ],
    [ 33, 6.187, 0.019 ],
    [ 34, 6.194, 0.021 ],
    [ 35, 6.192, 0.024 ],
    [ 36, 6.193, 0.024 ],
    [ 37, 6.187, 0.026 ],
    [ 38, 6.184, 0.026 ],
    [ 39, 6.183, 0.027 ],
    [ 40, 6.189, 0.027 ] ] )

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

# Remove the rows whose first item is between 30 and 35
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=30), (A[:,0]<=35) ) )[0], 0)

>>> A
array([[  1.80000000e+01,   6.21500000e+00,   2.50000000e-02],
       [  1.90000000e+01,   6.20300000e+00,   2.50000000e-02],
       [  2.60000000e+01,   6.18900000e+00,   2.10000000e-02],
       [  2.70000000e+01,   6.18800000e+00,   2.00000000e-02],
       [  2.80000000e+01,   6.19200000e+00,   1.90000000e-02],
       [  2.90000000e+01,   6.18500000e+00,   2.00000000e-02],
       [  3.60000000e+01,   6.19300000e+00,   2.40000000e-02],
       [  3.70000000e+01,   6.18700000e+00,   2.60000000e-02],
       [  3.80000000e+01,   6.18400000e+00,   2.60000000e-02],
       [  3.90000000e+01,   6.18300000e+00,   2.70000000e-02],
       [  4.00000000e+01,   6.18900000e+00,   2.70000000e-02]])

Upvotes: 3

Jakob
Jakob

Reputation: 1129

You don't need to add complexity with numpy for this. I'm guessing you're reading your file in into a list of lists here (with each row being a list within the overall data list like this: ((18, 6.215, 0.025), (19, 6.203, 0.025), ...)). In which case use the below rule:

for row in data:
    if((row[0] > 20 and row[0] < 25) or (row[0] > 30 and row[0] < 35)):
        data.remove(row)

Upvotes: -1

Related Questions