Anton
Anton

Reputation: 4815

Pandas Boolean .any() .all()

I kept getting ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). when trying boolean tests with pandas. Not understanding what it said, I decided to try to figure it out.

However, I am totally confused at this point.

Here I create a dataframe of two variables, with a single data point shared between them (3):

In [75]:

import pandas as pd

df = pd.DataFrame()

df['x'] = [1,2,3]
df['y'] = [3,4,5]

Now I try all(is x less than y), which I translate to "are all the values of x less than y", and I get an answer that doesn't make sense.

In [79]:

if all(df['x'] < df['y']):
    print('True')
else:
    print('False')
True

Next I try any(is x less than y), which I translate to "is any value of x less than y", and I get another answer that doesn't make sense.

In [77]:

if any(df['x'] < df['y']):
    print('True')
else:
    print('False')
False

In short: what does any() and all() actually do?

Upvotes: 22

Views: 58636

Answers (2)

bart-kosmala
bart-kosmala

Reputation: 981

To compare two pd.DataFrame objects for both content and structure equality you can use:

import pandas as pd

def are_df_equal(df: pd.DataFrame, df2: pd.DataFrame) -> bool:
    return df.equals(df2) and (df.all() == df2.all()).all()

Upvotes: 0

Sergey Antopolskiy
Sergey Antopolskiy

Reputation: 4290

Pandas suggests you to use Series methods any() and all(), not Python in-build functions.

I don't quite understand the source of the strange output you have (I get True in both cases in Python 2.7 and Pandas 0.17.0). But try the following, it should work. This uses Series.any() and Series.all() methods.

import pandas as pd

df = pd.DataFrame()

df['x'] = [1,2,3]
df['y'] = [3,4,5]

print (df['x'] < df['y']).all() # more pythonic way of
print (df['x'] < df['y']).any() # doing the same thing

This should print:

True
True

Upvotes: 14

Related Questions