Reputation: 39477
Say we have a pandas DataFrame
df
.
And let's say we call df.dropna(how='all', thresh=0)
on it.
Isn't this set of arguments logically contradicting?
In fact when setting how="all"
, isn't this redundant/contradicting
with any value we could specify for thresh
?
Somehow I don't get how thresh
and how="all"
should work together?
It's just confusing from logical standpoint.
# Isn't this logically contradicting?
# Argument how='all' says "remove rows which have all values set to NA"
# Argument thresh=0 says "keep rows which have at least 0 non-NA values" i.e. "keep all rows"
# Seems the thresh=0 takes priority over how="all"
df.dropna(how='all', thresh=0)
In fact if I may generalize this a bit... I come from Java. Java has method overloads. And I've always thought about this issue in Python. Since Python has no method overloads, it uses a single method name but multiple params usually with various default values. So each method in Python can easily turn into a master method representing a whole set of methods/procedures. But what if some sub-sets of argument values (like here in my example) just logically contradict to each other, or just don't make sense to be used together? In that case, what is a good (Python idiomatic) way to design such methods? The only solution I can think of is just to document such methods pedantically and just say: "OK, don't call this method with arguments a and b together, as it doesn't make sense, and/or the outcome may be unexpected".
Upvotes: 0
Views: 288
Reputation: 42422
The source code of pandas reads:
if thresh is not None:
mask = count >= thresh
elif how == "any":
mask = count == len(agg_obj._get_axis(agg_axis))
elif how == "all":
mask = count > 0
So thresh
takes precedence over how
.
Upvotes: 1