Reputation: 17339
I have a dataframe like this:
2017 2018 2012 2015 2014 2016
11647 0.044795 0.000000 0.000000 0.0 0.0 0.0
16389 0.089801 0.044900 0.000000 0.0 0.0 0.0
16404 0.014323 0.000000 0.000000 0.0 0.04 0.0
16407 0.052479 0.010442 0.009277 0.0 0.0 0.0
16409 0.000000 0.000000 0.004883 0.0 0.0 5.0
Note that columns are not sorted. For each row, I need to get the latest year with non-zero value. So the expected result is:
11647 2017
16389 2018
16404 2017
16407 2018
16409 2016
How to do that?
Upvotes: 0
Views: 1028
Reputation: 323316
Using stack
with max
df[df.ne(0)].stack().reset_index(level=1)['level_1'].max(level=0)
Out[386]:
11647 2017
16389 2018
16404 2017
16407 2018
16409 2016
Name: level_1, dtype: int64
Just update
df.ne(0).mul(df.columns).max(1)
Out[423]:
11647 2017.0
16389 2018.0
16404 2017.0
16407 2018.0
16409 2016.0
dtype: float64
Upvotes: 1
Reputation: 59274
Can use idxmax
in a sorted-column df
df[sorted(df.columns, reverse=True)].ne(0).idxmax(1)
11647 2017
16389 2018
16404 2017
16407 2018
16409 2016
dtype: object
Upvotes: 2
Reputation: 17339
df.apply(lambda row: row[row > 0].index.max(), axis=1)
gives the expected result.
Upvotes: 0