Reputation: 3737
So I have an 2D array data
that looks like this:
I want to count the number of rows on a certain condition of the second two columns. For example, in this particular slice of the array, I only have
1 | 2
But given the third row is range(1,4)
and the fourth range(0,3)
, I could have all of the following combinations:
1 | 0
1 | 1
1 | 2
2 | 0
2 | 1
2 | 2
3 | 0
3 | 1
3 | 2
I want to select the rows for which each of those conditions is true. But I'm not sure how to go about it? I've been working on this for the last 2 hours and I've come up with things using for loops, list comprehensions, etc. But it just gets more and more complicated and none of those ways actually worked. Is there a good way to do this in numpy, or even just plain python?
Any help would be greatly appreciated, thanks!!
Upvotes: 0
Views: 4286
Reputation: 3238
This works:
import numpy as np
# data array
data = np.array([[4,3,1,2],[4,3,5,1],[1,2,1,0]])
# array of acceptable combinations
cond = np.array([[1,0],[1,2]])
# index of rows matching the conditions
idx=np.array([any(np.equal(cond,row).all(1)) for row in data[:,2:]])
# selected rows
data[idx]
# array([[4, 3, 1, 2],
# [1, 2, 1, 0]]
Upvotes: 1
Reputation: 231385
Boolean masking a good general purpose tool for selecting rows or columns (or elements) from an array based on one or more conditions.
Make an array with integers in the [0,9) range:
In [326]: arr=np.random.randint(0,10,(20,4))
In [327]: arr
Out[327]:
array([[9, 4, 1, 1],
[6, 1, 9, 6],
[5, 3, 4, 9],
[7, 4, 0, 4],
[6, 2, 3, 5],
[4, 5, 1, 8],
[0, 9, 1, 3],
[7, 7, 1, 5],
[5, 9, 6, 6],
[0, 9, 2, 1],
[4, 9, 1, 6],
[5, 1, 5, 2],
[1, 5, 2, 0],
[9, 0, 6, 5],
[1, 9, 2, 4],
[6, 7, 7, 9],
[5, 2, 5, 4],
[1, 6, 5, 9],
[0, 4, 3, 1],
[7, 7, 7, 7]])
Find elements in 2 columns between 0 and 3. Python allows tests like 0<x<3
, but numpy
only allows one sided ones. The parenthesis are important to establish operator order. (|
for or):
In [328]: mask=(0<arr[:,2:]) & (arr[:,2:]<3)
In [329]: mask
Out[329]:
array([[ True, True],
[False, False],
[False, False],
[False, False],
[False, False],
[ True, False],
[ True, False],
[ True, False],
[False, False],
[ True, True],
[ True, False],
[False, True],
[ True, False],
[False, False],
[ True, False],
[False, False],
[False, False],
[False, False],
[False, True],
[False, False]], dtype=bool)
Now we can select rows where either column is in the right range:
In [330]: arr[mask.any(axis=1),:]
Out[330]:
array([[9, 4, 1, 1],
[4, 5, 1, 8],
[0, 9, 1, 3],
[7, 7, 1, 5],
[0, 9, 2, 1],
[4, 9, 1, 6],
[5, 1, 5, 2],
[1, 5, 2, 0],
[1, 9, 2, 4],
[0, 4, 3, 1]])
or where both are:
In [331]: arr[mask.all(axis=1),:]
Out[331]:
array([[9, 4, 1, 1],
[0, 9, 2, 1]])
where
is often used to convert the boolean array into index numbers:
In [332]: np.where(mask.all(axis=1))
Out[332]: (array([0, 9], dtype=int32),)
In [333]: arr[_,:]
Out[333]:
array([[[9, 4, 1, 1],
[0, 9, 2, 1]]])
Upvotes: 2