T. Pieper
T. Pieper

Reputation: 36

Pandas DataFrame isin(): How does conditional selection work in detail?

While working with pandas I ran into an issue which I can't quite explain. Let me give an example where the DataFrame is called "reviews":

The following code doesn't run:

reviews[(reviews["points"] >= 95) & (reviews["country"] in ["Australia"])]

Instead one can use:

reviews[(reviews["points"] >= 95) & (reviews["country"].isin(["Australia"]))]

My first assumption was that this is caused by the way the bitwise operator & works, but testing this I was suprised to find out the follwing line equals to True: True & ("hi" in ["hi", "Hello"])

Obviously reviews["country"] is not just a str. I guess with the operator >= some magic happens that is not implemented for in. Therefore, isin() is necessary. Maybe someone can explain this further / better?

The example works with something like the following DataFrame:

    country     description     designation     points  
0   Italy       Aromas          Vulkà Bianco    87  

This structure is basically taken from https://www.kaggle.com/learn/pandas lesson 2.9.

Error-MSG: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 0

Views: 184

Answers (1)

Ji Wei
Ji Wei

Reputation: 881

in is a python keyword, while isin is a method for the Series which checks "whether each element in the DataFrame is contained in values." link

Upvotes: 2

Related Questions