Xhattam
Xhattam

Reputation: 307

Pandas idxmax() returns 0 if no value matches condition?

I'm trying to understand the behaviour of idxmax().

I'm using idxmax() to get all rows below the first row meeting a condition in a dataframe, like this:

df = df[df['A'].gt(0).idxmax():]

I'm then checking if the resulting dataframe is empty. There's one unit test where I expect an empty dataframe (no row meets the condition), but it was never empty, so I looked into it.

I found that if the condition is NEVER met, idxmax() returns 0 (instead of, say, None or instead of throwing an exception I could catch) - which clashes with the case where the condition IS MET at row 0.

Here's an example of what I'm seeing:

import pandas as pd

df = pd.DataFrame(data={'A':[0, 0, 0, 0]})  # no element where gt(0) is True
print("Truths values\n", df['A'].gt(0))     # checking the truth values of the Series
print("Index of first row where 'A' is at 0: ", df['A'].gt(0).idxmax())

The dataframe:

First dataframe

The execution result:

>>> Truth values
0    False
1    False
2    False
3    False

Index of first row where 'A' is at 0: 0   <--- ???

And with a different dataframe:

df2 = pd.DataFrame(data={'A':[1, 0, 0, 0]})
print("Truths values\n", df['A'].gt(0))
print("Index of first row where 'A' is at 0", df['A'].gt(0).idxmax())

The dataframe:

enter image description here

The execution result:

Truth values
0    True
1    False
2    False
3    False

Index of first row where 'A' is at 0: 0

So we end up with the same behaviour for two different inputs.

My current solution: summing over 'A' and checking if the sum is 0, and doing something different if that's the case - which seems a bit overkill.

Am I using idxmax() wrong ? Could someone shed some light on this, as the behaviour seems very counter-intuitive ?

Thanks :)

Upvotes: 0

Views: 539

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30042

Series.idxmax will return the first row label of maximum value if multiple values equal the maximum.

enter image description here

Therefore, in following dataframe, index 0 will be returned for Series.idxmax since all False equals.

0    False
1    False
2    False
3    False

In following dataframe, index 0 will be returned for Series.idxmax since True is the maxiumn value. (In Python, True is large than False, you can print the result of True > False)

0    True
1    False
2    False
3    False

In your df = df[df['A'].gt(0).idxmax():], you are actually selecting columns with index slice. If you want to select rows, you need use df.loc[df['A'].gt(0).idxmax():]

Upvotes: 0

Related Questions