Reputation: 782
I want to loop through a Pandas DataFrame using numpy.where() and get a list that contains only the elements that satisfy the condition.
For example, lets say I have the following pandas DataFrame:
df = pd.DataFrame({"A": [1, 2, 3, 5, 3, 7, 3],
"B": [0, 1, 6, 4, 9, 8, 2],
"id": [0, 1, 2, 3, 4, 5, 6]
})
I would like to return a list of those id
values for which column A
is equal to 3 and column B
is greater than or equal to 5
I tried:
ids = np.where((df["A"] == 3) & (df["B"] >= 5)), df["id"])
But that gives the following error:
ValueError: either both or neither of x and y should be given
I realise I could solve this by just returning some default value like -1 in the else
part of the where
and later remove all occurences of -1 from ids
, but that's both ineffective for my huge Dataframe and does not appear to be the most elegant way.
How to solve this in the most efficient (least time consuming) way? If a where
is not the most efficient solution I'm open to other suggestions.
Upvotes: 1
Views: 401
Reputation: 34046
Use:
In [1225]: df.loc[(df["A"] == 3) & (df["B"] >= 5), 'id'].to_numpy()
Out[1225]: array([2, 4])
Upvotes: 2
Reputation: 1466
You can do this within Pandas itself by using either boolean indexing or the query method on the dataframe.
In [4]: import pandas as pd
In [5]: df = pd.DataFrame({"A": [1, 2, 3, 5, 3, 7, 3],
...: "B": [0, 1, 6, 4, 9, 8, 2],
...: "id": [0, 1, 2, 3, 4, 5, 6]
...: })
In [6]: df
Out[6]:
A B id
0 1 0 0
1 2 1 1
2 3 6 2
3 5 4 3
4 3 9 4
5 7 8 5
6 3 2 6
In [7]: df[(df["A"] == 3) & (df["B"] >= 5)]['id'].to_list()
Out[7]: [2, 4]
In [8]: df.query("A == 3 and B >= 5")['id'].to_list()
Out[8]: [2, 4]
Upvotes: 2