peter.petrov
peter.petrov

Reputation: 39477

pandas - dropna - what if arguments contradict?

Say we have a pandas DataFrame df.

pic003

And let's say we call df.dropna(how='all', thresh=0) on it.

Isn't this set of arguments logically contradicting?
In fact when setting how="all", isn't this redundant/contradicting
with any value we could specify for thresh?

Somehow I don't get how thresh and how="all" should work together?
It's just confusing from logical standpoint.

# Isn't this logically contradicting?
# Argument how='all' says "remove rows which have all values set to NA"
# Argument thresh=0 says "keep rows which have at least 0 non-NA values" i.e. "keep all rows"
# Seems the thresh=0 takes priority over how="all"

df.dropna(how='all', thresh=0)

In fact if I may generalize this a bit... I come from Java. Java has method overloads. And I've always thought about this issue in Python. Since Python has no method overloads, it uses a single method name but multiple params usually with various default values. So each method in Python can easily turn into a master method representing a whole set of methods/procedures. But what if some sub-sets of argument values (like here in my example) just logically contradict to each other, or just don't make sense to be used together? In that case, what is a good (Python idiomatic) way to design such methods? The only solution I can think of is just to document such methods pedantically and just say: "OK, don't call this method with arguments a and b together, as it doesn't make sense, and/or the outcome may be unexpected".

Upvotes: 0

Views: 288

Answers (1)

mck
mck

Reputation: 42422

The source code of pandas reads:

        if thresh is not None:
            mask = count >= thresh
        elif how == "any":
            mask = count == len(agg_obj._get_axis(agg_axis))
        elif how == "all":
            mask = count > 0

So thresh takes precedence over how.

Upvotes: 1

Related Questions