LearningCode
LearningCode

Reputation: 53

return the last non-zero value

I have a DataFrame similar to this one.

name company count_2017 count_2018 count_2019 last_id
joe     abc     1           0         2         230283
cindy   bcd     0           2         0         239382
john    cde     0           1         0          238372
wang    def     0           0         3          1332

I need to return the last non-zero value and its year

name company count_2017 count_2018 count_2019 last_id.   year
joe     abc     1           0         2         230283   2019
cindy   bcd     0           2         0         239382   2018
john    cde     0           1         0          238372. 2018
wang    def     5           0         0          1332    2017

The 'year' column generated is the name of the last non-zero column. For example, in row 'joe', the last non-zero value is 2, then I return year 2019.

I used code

columns_first = df[[f'count_{yr}' for yr in range(2000, YEAR)]]
col_first = columns_first.columns
df["FIRST_YEAR"] = (columns_first.select_dtypes(float)
                 .gt(0)
                 .dot(col_first.str[-4:] + " ")
                 .str.split()
                 .str[0])

However, I couldn't return the last non-zero value, I can only get the first non-zero value. Is there anyway to fix this piece of code?

The return should be the original columns plus the year column as above year 2019 2018 2018 2017

Upvotes: 1

Views: 252

Answers (1)

BENY
BENY

Reputation: 323226

You can do idxmax

df.filter(like='count_').ne(0).iloc[:,::-1].idxmax(1).str.split('_').str[-1]
0    2019
1    2018
2    2018
3    2019
dtype: object

Upvotes: 1

Related Questions