Peter
Peter

Reputation: 782

How to let numpy.where() return only the elements satisfying the condition?

I want to loop through a Pandas DataFrame using numpy.where() and get a list that contains only the elements that satisfy the condition.

For example, lets say I have the following pandas DataFrame:

df = pd.DataFrame({"A": [1, 2, 3, 5, 3, 7, 3],
                   "B": [0, 1, 6, 4, 9, 8, 2],
                   "id": [0, 1, 2, 3, 4, 5, 6]
                  })

I would like to return a list of those id values for which column A is equal to 3 and column B is greater than or equal to 5

I tried:

ids = np.where((df["A"] == 3) & (df["B"] >= 5)), df["id"])

But that gives the following error:

ValueError: either both or neither of x and y should be given

I realise I could solve this by just returning some default value like -1 in the else part of the where and later remove all occurences of -1 from ids, but that's both ineffective for my huge Dataframe and does not appear to be the most elegant way.

How to solve this in the most efficient (least time consuming) way? If a where is not the most efficient solution I'm open to other suggestions.

Upvotes: 1

Views: 401

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34046

Use:

In [1225]: df.loc[(df["A"] == 3) & (df["B"] >= 5), 'id'].to_numpy()
Out[1225]: array([2, 4])

Upvotes: 2

navneethc
navneethc

Reputation: 1466

You can do this within Pandas itself by using either boolean indexing or the query method on the dataframe.

In [4]: import pandas as pd

In [5]: df = pd.DataFrame({"A": [1, 2, 3, 5, 3, 7, 3],
   ...:                    "B": [0, 1, 6, 4, 9, 8, 2],
   ...:                    "id": [0, 1, 2, 3, 4, 5, 6]
   ...:                   })

In [6]: df
Out[6]:
   A  B  id
0  1  0   0
1  2  1   1
2  3  6   2
3  5  4   3
4  3  9   4
5  7  8   5
6  3  2   6

In [7]: df[(df["A"] == 3) & (df["B"] >= 5)]['id'].to_list()
Out[7]: [2, 4]

In [8]: df.query("A == 3 and B >= 5")['id'].to_list()
Out[8]: [2, 4]

Upvotes: 2

Related Questions