notIntoXS
notIntoXS

Reputation: 129

Numpy array: How to extract whole rows based on values in a column

I am looking for the equivalent of an SQL 'where' query over a table. I have done a lot of searching and I'm either using the wrong search terms or not understanding the answers. Probably both.

So a table is a 2 dimensional numpy array.

my_array = np.array([[32, 55,  2],
                     [15,  2, 60], 
                     [76, 90,  2], 
                     [ 6, 65,  2]])

I wish to 'end up' with a numpy array of the same shape where eg the second column values are >= 55 AND <= 65.

So my desired numpy array would be...

desired_array([[32, 55,  2],
               [ 6, 65,  2]])

Also, does 'desired_array' order match 'my_array' order?

Upvotes: 1

Views: 1496

Answers (4)

Manh Kh&#244;i Duong
Manh Kh&#244;i Duong

Reputation: 173

You dont mean the same shape. You probably meant the same column size. The shape of my_array is (4, 3) and the shape of your desired array is (2, 3). I would recommend masking, too.

Upvotes: 0

Karl Knechtel
Karl Knechtel

Reputation: 61498

The general Numpy approach to filtering an array is to create a "mask" that matches the desired part of the array, and then use it to index in.

>>> my_array[((55 <= my_array) & (my_array <= 65))[:, 1]]
array([[32, 55,  2],
       [ 6, 65,  2]])

Breaking it down:

# Comparing an array to a scalar gives you an array of all the results of
# individual element comparisons (this is called "broadcasting").
# So we take two such boolean arrays, resulting from comparing values to the
# two thresholds, and combine them together.
mask = (55 <= my_array) & (my_array <= 65)

# We only want to care about the [1] element in the second array dimension,
# so we take a 1-dimensional slice of that mask.
desired_rows = mask[:, 1]

# Finally we use those values to select the desired rows.
desired_array = my_array[desired_rows]

(The first two operations could instead be swapped - that way I imagine is more efficient, but it wouldn't matter for something this small. This way is the way that occurred to me first.)

Upvotes: 0

Gilseung Ahn
Gilseung Ahn

Reputation: 2614

Just make mask and use it.

mask = np.logical_and(my_array[:, 1] >= 55, my_array[:, 1] <= 65)
desired_array = my_array[mask]
desired_array

Upvotes: 4

bcwarner
bcwarner

Reputation: 1

You can use a filter statement with a lambda that checks each row for the desired condition to get the desired result:

my_array = np.array([[32, 55,  2],
                     [15,  2, 60], 
                     [76, 90,  2], 
                     [ 6, 65,  2]])

desired_array = np.array([l for l in filter(lambda x: x[1] >= 55 and x[1] <= 65, my_array)])

Upon running this, we get:

>>> desired_array
array([[32, 55,  2],
       [ 6, 65,  2]])

Upvotes: -1

Related Questions