ocean800
ocean800

Reputation: 3737

Python - Select rows of array on certain condition?

So I have an 2D array data that looks like this:

enter image description here

I want to count the number of rows on a certain condition of the second two columns. For example, in this particular slice of the array, I only have

1 | 2 

But given the third row is range(1,4) and the fourth range(0,3), I could have all of the following combinations:

1 | 0
1 | 1
1 | 2

2 | 0
2 | 1 
2 | 2

3 | 0 
3 | 1 
3 | 2 

I want to select the rows for which each of those conditions is true. But I'm not sure how to go about it? I've been working on this for the last 2 hours and I've come up with things using for loops, list comprehensions, etc. But it just gets more and more complicated and none of those ways actually worked. Is there a good way to do this in numpy, or even just plain python?

Any help would be greatly appreciated, thanks!!

Upvotes: 0

Views: 4286

Answers (2)

Mahdi
Mahdi

Reputation: 3238

This works:

import numpy as np
# data array 
data = np.array([[4,3,1,2],[4,3,5,1],[1,2,1,0]])
# array of acceptable combinations
cond = np.array([[1,0],[1,2]])
# index of rows matching the conditions
idx=np.array([any(np.equal(cond,row).all(1)) for row in data[:,2:]])
# selected rows
data[idx]
# array([[4, 3, 1, 2],
#   [1, 2, 1, 0]]

Upvotes: 1

hpaulj
hpaulj

Reputation: 231385

Boolean masking a good general purpose tool for selecting rows or columns (or elements) from an array based on one or more conditions.

Make an array with integers in the [0,9) range:

In [326]: arr=np.random.randint(0,10,(20,4))
In [327]: arr
Out[327]: 
array([[9, 4, 1, 1],
       [6, 1, 9, 6],
       [5, 3, 4, 9],
       [7, 4, 0, 4],
       [6, 2, 3, 5],
       [4, 5, 1, 8],
       [0, 9, 1, 3],
       [7, 7, 1, 5],
       [5, 9, 6, 6],
       [0, 9, 2, 1],
       [4, 9, 1, 6],
       [5, 1, 5, 2],
       [1, 5, 2, 0],
       [9, 0, 6, 5],
       [1, 9, 2, 4],
       [6, 7, 7, 9],
       [5, 2, 5, 4],
       [1, 6, 5, 9],
       [0, 4, 3, 1],
       [7, 7, 7, 7]])

Find elements in 2 columns between 0 and 3. Python allows tests like 0<x<3, but numpy only allows one sided ones. The parenthesis are important to establish operator order. (| for or):

In [328]: mask=(0<arr[:,2:]) & (arr[:,2:]<3)
In [329]: mask
Out[329]: 
array([[ True,  True],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [False, False],
       [ True,  True],
       [ True, False],
       [False,  True],
       [ True, False],
       [False, False],
       [ True, False],
       [False, False],
       [False, False],
       [False, False],
       [False,  True],
       [False, False]], dtype=bool)

Now we can select rows where either column is in the right range:

In [330]: arr[mask.any(axis=1),:]
Out[330]: 
array([[9, 4, 1, 1],
       [4, 5, 1, 8],
       [0, 9, 1, 3],
       [7, 7, 1, 5],
       [0, 9, 2, 1],
       [4, 9, 1, 6],
       [5, 1, 5, 2],
       [1, 5, 2, 0],
       [1, 9, 2, 4],
       [0, 4, 3, 1]])

or where both are:

In [331]: arr[mask.all(axis=1),:]
Out[331]: 
array([[9, 4, 1, 1],
       [0, 9, 2, 1]])

where is often used to convert the boolean array into index numbers:

In [332]: np.where(mask.all(axis=1))
Out[332]: (array([0, 9], dtype=int32),)
In [333]: arr[_,:]
Out[333]: 
array([[[9, 4, 1, 1],
        [0, 9, 2, 1]]])

Upvotes: 2

Related Questions