Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79395

not bool does not work but bool != True works

I am trying to find the list of unique colours from the CSV downloaded from https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/vfnx-vebw

The following does not work:

data = pandas.read_csv("2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv")

fur_color_col = data["Primary Fur Color"]
print(data[not pandas.isna(data["Primary Fur Color"])]["Primary Fur Color"].unique())

The error is:

Traceback (most recent call last):
  File "/Users/arvind.avinash/PycharmProjects/AdHoc/main.py", line 6, in <module>
    print(data[not pandas.isna(data["Primary Fur Color"])]["Primary Fur Color"].unique())
  File "/Users/arvind.avinash/PycharmProjects/AdHoc/venv/lib/python3.9/site-packages/pandas/core/generic.py", line 1537, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The following works:

data = pandas.read_csv("2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv")

fur_color_col = data["Primary Fur Color"]
print(data[pandas.isna(data["Primary Fur Color"]) != True]["Primary Fur Color"].unique())

and outputs:

['Gray' 'Cinnamon' 'Black']

Why does not bool not work while bool != True does?

Upvotes: 2

Views: 211

Answers (2)

Ture P&#229;lsson
Ture P&#229;lsson

Reputation: 6786

The reason that not does not work, is that it is defined by the language spec to return a single boolean value, which does not make much sense with a Pandas series.

The operator not yields True if its argument is false, False otherwise.

The != operator has no such limitations, which means that Pandas is free to define it as an element-by-element comparison.

Upvotes: 2

jezrael
jezrael

Reputation: 863166

Because non for arrays (in pandas or numpy) is operator ~, so need:

print(data[~pandas.isna(data["Primary Fur Color"])]["Primary Fur Color"].unique())

For compare in arrays (in pandas or numpy) are used same operators like in pure python, so != True working well.


Or is possible use Series.notna:

print(data.loc[data["Primary Fur Color"].notna(), "Primary Fur Color"].unique())

Upvotes: 2

Related Questions