A2N15
A2N15

Reputation: 605

conditions inside conditions pandas

below is my DF in which I want to create a column based on other columns

test = pd.DataFrame({"Year_2017" : [np.nan, np.nan, np.nan, 4], "Year_2018" : [np.nan, np.nan, 3, np.nan], "Year_2019" : [np.nan, 2, np.nan, np.nan], "Year_2020" : [1, np.nan, np.nan, np.nan]})

    Year_2017   Year_2018   Year_2019   Year_2020
0       NaN         NaN         NaN          1
1       NaN         NaN          2          NaN
2       NaN          3          NaN         NaN
3        4          NaN         NaN         NaN

The aim will be to create a new column and take value of the columns which is notna()

Below is what I tried without success..

test['Final'] = np.where(test.Year_2017.isna(), test.Year_2018,
                              np.where(test.Year_2018.isna(), test.Year_2019,
                                      np.where(test.Year_2019.isna(), test.Year_2020, test.Year_2019)))

     Year_2017  Year_2018   Year_2019   Year_2020   Final
0      NaN        NaN          NaN          1       NaN
1      NaN        NaN           2          NaN      NaN
2      NaN         3           NaN         NaN       3
3       4         NaN          NaN         NaN      NaN

The expected output:

     Year_2017  Year_2018   Year_2019   Year_2020   Final
0      NaN        NaN          NaN          1        1
1      NaN        NaN           2          NaN       2
2      NaN         3           NaN         NaN       3
3       4         NaN          NaN         NaN       4

Upvotes: 1

Views: 49

Answers (2)

jezrael
jezrael

Reputation: 862521

You can forward or back filling missing values and then select last or first column:

test['Final'] = test.ffill(axis=1).iloc[:, -1]

test['Final'] = test.bfill(axis=1).iloc[:, 0]

If there is only one non missing values per rows and numeric use:

test['Final'] = test.min(1)
test['Final'] = test.max(1)
test['Final'] = test.mean(1)
test['Final'] = test.sum(1, min_count=1)

Upvotes: 2

mozway
mozway

Reputation: 260430

I you only have a single non NA value per row, you can use:

df['Final'] = test.max(axis=1)

(or other aggregators)

Upvotes: 1

Related Questions