Reputation: 53
I have a DataFrame similar to this one.
name company count_2017 count_2018 count_2019 last_id
joe abc 1 0 2 230283
cindy bcd 0 2 0 239382
john cde 0 1 0 238372
wang def 0 0 3 1332
I need to return the last non-zero value and its year
name company count_2017 count_2018 count_2019 last_id. year
joe abc 1 0 2 230283 2019
cindy bcd 0 2 0 239382 2018
john cde 0 1 0 238372. 2018
wang def 5 0 0 1332 2017
The 'year' column generated is the name of the last non-zero column. For example, in row 'joe', the last non-zero value is 2, then I return year 2019.
I used code
columns_first = df[[f'count_{yr}' for yr in range(2000, YEAR)]]
col_first = columns_first.columns
df["FIRST_YEAR"] = (columns_first.select_dtypes(float)
.gt(0)
.dot(col_first.str[-4:] + " ")
.str.split()
.str[0])
However, I couldn't return the last non-zero value, I can only get the first non-zero value. Is there anyway to fix this piece of code?
The return should be the original columns plus the year column as above year 2019 2018 2018 2017
Upvotes: 1
Views: 252
Reputation: 323226
You can do idxmax
df.filter(like='count_').ne(0).iloc[:,::-1].idxmax(1).str.split('_').str[-1]
0 2019
1 2018
2 2018
3 2019
dtype: object
Upvotes: 1