Reputation: 129
I am looking for the equivalent of an SQL 'where' query over a table. I have done a lot of searching and I'm either using the wrong search terms or not understanding the answers. Probably both.
So a table is a 2 dimensional numpy array.
my_array = np.array([[32, 55, 2],
[15, 2, 60],
[76, 90, 2],
[ 6, 65, 2]])
I wish to 'end up' with a numpy array of the same shape where eg the second column values are >= 55 AND <= 65.
So my desired numpy array would be...
desired_array([[32, 55, 2],
[ 6, 65, 2]])
Also, does 'desired_array' order match 'my_array' order?
Upvotes: 1
Views: 1496
Reputation: 173
You dont mean the same shape. You probably meant the same column size. The shape of my_array is (4, 3) and the shape of your desired array is (2, 3). I would recommend masking, too.
Upvotes: 0
Reputation: 61498
The general Numpy approach to filtering an array is to create a "mask" that matches the desired part of the array, and then use it to index in.
>>> my_array[((55 <= my_array) & (my_array <= 65))[:, 1]]
array([[32, 55, 2],
[ 6, 65, 2]])
Breaking it down:
# Comparing an array to a scalar gives you an array of all the results of
# individual element comparisons (this is called "broadcasting").
# So we take two such boolean arrays, resulting from comparing values to the
# two thresholds, and combine them together.
mask = (55 <= my_array) & (my_array <= 65)
# We only want to care about the [1] element in the second array dimension,
# so we take a 1-dimensional slice of that mask.
desired_rows = mask[:, 1]
# Finally we use those values to select the desired rows.
desired_array = my_array[desired_rows]
(The first two operations could instead be swapped - that way I imagine is more efficient, but it wouldn't matter for something this small. This way is the way that occurred to me first.)
Upvotes: 0
Reputation: 2614
Just make mask and use it.
mask = np.logical_and(my_array[:, 1] >= 55, my_array[:, 1] <= 65)
desired_array = my_array[mask]
desired_array
Upvotes: 4
Reputation: 1
You can use a filter
statement with a lambda
that checks each row for the desired condition to get the desired result:
my_array = np.array([[32, 55, 2],
[15, 2, 60],
[76, 90, 2],
[ 6, 65, 2]])
desired_array = np.array([l for l in filter(lambda x: x[1] >= 55 and x[1] <= 65, my_array)])
Upon running this, we get:
>>> desired_array
array([[32, 55, 2],
[ 6, 65, 2]])
Upvotes: -1