Reputation: 4815
I kept getting ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
when trying boolean tests with pandas. Not understanding what it said, I decided to try to figure it out.
However, I am totally confused at this point.
Here I create a dataframe of two variables, with a single data point shared between them (3):
In [75]:
import pandas as pd
df = pd.DataFrame()
df['x'] = [1,2,3]
df['y'] = [3,4,5]
Now I try all(is x less than y), which I translate to "are all the values of x less than y", and I get an answer that doesn't make sense.
In [79]:
if all(df['x'] < df['y']):
print('True')
else:
print('False')
True
Next I try any(is x less than y), which I translate to "is any value of x less than y", and I get another answer that doesn't make sense.
In [77]:
if any(df['x'] < df['y']):
print('True')
else:
print('False')
False
In short: what does any() and all() actually do?
Upvotes: 22
Views: 58636
Reputation: 981
To compare two pd.DataFrame
objects for both content and structure equality you can use:
import pandas as pd
def are_df_equal(df: pd.DataFrame, df2: pd.DataFrame) -> bool:
return df.equals(df2) and (df.all() == df2.all()).all()
Upvotes: 0
Reputation: 4290
Pandas suggests you to use Series methods any()
and all()
, not Python in-build functions.
I don't quite understand the source of the strange output you have (I get True in both cases in Python 2.7 and Pandas 0.17.0). But try the following, it should work. This uses Series.any()
and Series.all()
methods.
import pandas as pd
df = pd.DataFrame()
df['x'] = [1,2,3]
df['y'] = [3,4,5]
print (df['x'] < df['y']).all() # more pythonic way of
print (df['x'] < df['y']).any() # doing the same thing
This should print:
True
True
Upvotes: 14