Reputation: 307
I'm trying to understand the behaviour of idxmax()
.
I'm using idxmax()
to get all rows below the first row meeting a condition in a dataframe, like this:
df = df[df['A'].gt(0).idxmax():]
I'm then checking if the resulting dataframe is empty. There's one unit test where I expect an empty dataframe (no row meets the condition), but it was never empty, so I looked into it.
I found that if the condition is NEVER met, idxmax() returns 0 (instead of, say, None
or instead of throwing an exception I could catch) - which clashes with the case where the condition IS MET at row 0.
Here's an example of what I'm seeing:
import pandas as pd
df = pd.DataFrame(data={'A':[0, 0, 0, 0]}) # no element where gt(0) is True
print("Truths values\n", df['A'].gt(0)) # checking the truth values of the Series
print("Index of first row where 'A' is at 0: ", df['A'].gt(0).idxmax())
The dataframe:
The execution result:
>>> Truth values
0 False
1 False
2 False
3 False
Index of first row where 'A' is at 0: 0 <--- ???
And with a different dataframe:
df2 = pd.DataFrame(data={'A':[1, 0, 0, 0]})
print("Truths values\n", df['A'].gt(0))
print("Index of first row where 'A' is at 0", df['A'].gt(0).idxmax())
The dataframe:
The execution result:
Truth values
0 True
1 False
2 False
3 False
Index of first row where 'A' is at 0: 0
So we end up with the same behaviour for two different inputs.
My current solution: summing over 'A' and checking if the sum is 0, and doing something different if that's the case - which seems a bit overkill.
Am I using idxmax()
wrong ? Could someone shed some light on this, as the behaviour seems very counter-intuitive ?
Thanks :)
Upvotes: 0
Views: 539
Reputation: 30042
Series.idxmax
will return the first row label of maximum value if multiple values equal the maximum.
Therefore, in following dataframe, index 0
will be returned for Series.idxmax
since all False
equals.
0 False
1 False
2 False
3 False
In following dataframe, index 0
will be returned for Series.idxmax
since True
is the maxiumn value. (In Python, True
is large than False
, you can print the result of True > False
)
0 True
1 False
2 False
3 False
In your df = df[df['A'].gt(0).idxmax():]
, you are actually selecting columns with index slice. If you want to select rows, you need use df.loc[df['A'].gt(0).idxmax():]
Upvotes: 0