IDontKnowAnything
IDontKnowAnything

Reputation: 213

How to use query function with bool in python pandas?

I'm trying to do something like

df.query("'column' == 'a'").count()

but with

df.query("'column' == False").count()

What is the right way of using query with a bool column?

Upvotes: 7

Views: 15602

Answers (3)

PascalVKooten
PascalVKooten

Reputation: 21451

Even simpler using query:

df.query("~column").count()

Actually, below are the non-query ways to do it, they are actually not as pretty and can be slower (as query is sometimes very optimized)

Though you really should be using:

df[~df["column"]].count()

or if you prefer (more readable, but not always possible):

df[~df.column].count()

Upvotes: 3

Bohdan Turani
Bohdan Turani

Reputation: 19

Pandas uses pandas.eval() to evaluate code you pass to the pandas.query(). pandas.eval() makes this:

Evaluate a Python expression as a string using various backends.

In Python you need to use is operator to compare False to anything, simply because if you compare something to False you will always get False as result(That's how Python works, I don't really know why). pandas.query() seems to not support is statement, but we have workarounds:

  • We can check if column != column. If that returns True, that means that we are comparing some value to False. Use df.query("column != False")

  • We can use pandas functions if we pass pandas library in the local_dict keyword parameter. Like:

    import pandas as pd
    local_vars = {'pd': pd}
    df.query(expr="@pd.isna(column)",local_dict=local_vars)
    

Also I am not sure what are you trying to do with count() as count() counts non-NA cells for each column or row.

  • If you are trying to simply count the rows, than use shape.
  • If you are trying to count how many NA-cells are in each column, using only rows where column equals to False, than its OK and should work

P.S don't use quotes around column name in df.query().

Upvotes: -1

timgeb
timgeb

Reputation: 78750

It's simply 'column == False'.

>>> df = pd.DataFrame([[False, 1], [True, 2], [False, 3]], columns=['column', 'another_column'])                       
>>> df                                                                                                                 
   column  another_column
0   False               1
1    True               2
2   False               3
>>> df.query('column == False')                                                                                        
   column  another_column
0   False               1
2   False               3
>>> df.query('column == False').count()                                                                                
column            2
another_column    2
dtype: int64

Personally, I prefer boolean indexing (if applicable to your situation).

>>> df[~df['column']]                                                                                                  
   column  another_column
0   False               1
2   False               3
>>> df[~df['column']].count()                                                                                          
column            2
another_column    2
dtype: int64

Upvotes: 7

Related Questions