Reputation: 23
Can anyone explain the following behaviour. I am expecting all three rows to be returned.
import pandas as pd
test_dict = {
'col1':[None, None, None],
'col2':[True, False, True],
'col3':[True, True, False]
}
df = pd.DataFrame(test_dict)
df[ df.col1 | df.col2 | df.col3 ]
>>> Return only first two rows (index 0 and 1)
Replacing the None
values with empty strings using df.fillna('')
appears to fix it but I don't understand why the first two rows work fine if None
is an issue.
Also changing the order of the comparisons effects it. If I swap col2
and col3
in the mask then the row with index 1 is no longer returned but the row with index 2 is returned. If col1
comes last then all rows are returned.
Upvotes: 2
Views: 192
Reputation: 150735
The problem is that the evaluation is from left to right. That is
df.col1 | df.col2 | df.col3 == (df.col1 | df.col2) | df.col3
Now, I think this is an implementation choice in Pandas that None | True
is evaluated as False
. So in this case (df.col1 | df.col2)
is all False
. That's why you only see the first to rows.
To fix this. use
df[df.any(axis=1)]
Upvotes: 3