Reputation: 213
I'm trying to do something like
df.query("'column' == 'a'").count()
but with
df.query("'column' == False").count()
What is the right way of using query
with a bool column?
Upvotes: 7
Views: 15602
Reputation: 21451
Even simpler using query:
df.query("~column").count()
Actually, below are the non-query ways to do it, they are actually not as pretty and can be slower (as query is sometimes very optimized)
Though you really should be using:
df[~df["column"]].count()
or if you prefer (more readable, but not always possible):
df[~df.column].count()
Upvotes: 3
Reputation: 19
Pandas uses pandas.eval() to evaluate code you pass to the pandas.query(). pandas.eval() makes this:
Evaluate a Python expression as a string using various backends.
In Python you need to use is operator to compare False to anything, simply because if you compare something to False you will always get False as result(That's how Python works, I don't really know why). pandas.query() seems to not support is statement, but we have workarounds:
We can check if column != column. If that returns True, that means that we are comparing some value to False. Use df.query("column != False")
We can use pandas functions if we pass pandas library in the local_dict keyword parameter. Like:
import pandas as pd
local_vars = {'pd': pd}
df.query(expr="@pd.isna(column)",local_dict=local_vars)
Also I am not sure what are you trying to do with count() as count() counts non-NA cells for each column or row
.
P.S don't use quotes around column name in df.query().
Upvotes: -1
Reputation: 78750
It's simply 'column == False'
.
>>> df = pd.DataFrame([[False, 1], [True, 2], [False, 3]], columns=['column', 'another_column'])
>>> df
column another_column
0 False 1
1 True 2
2 False 3
>>> df.query('column == False')
column another_column
0 False 1
2 False 3
>>> df.query('column == False').count()
column 2
another_column 2
dtype: int64
Personally, I prefer boolean indexing (if applicable to your situation).
>>> df[~df['column']]
column another_column
0 False 1
2 False 3
>>> df[~df['column']].count()
column 2
another_column 2
dtype: int64
Upvotes: 7