Reputation: 605
below is my DF in which I want to create a column based on other columns
test = pd.DataFrame({"Year_2017" : [np.nan, np.nan, np.nan, 4], "Year_2018" : [np.nan, np.nan, 3, np.nan], "Year_2019" : [np.nan, 2, np.nan, np.nan], "Year_2020" : [1, np.nan, np.nan, np.nan]})
Year_2017 Year_2018 Year_2019 Year_2020
0 NaN NaN NaN 1
1 NaN NaN 2 NaN
2 NaN 3 NaN NaN
3 4 NaN NaN NaN
The aim will be to create a new column and take value of the columns which is notna()
Below is what I tried without success..
test['Final'] = np.where(test.Year_2017.isna(), test.Year_2018,
np.where(test.Year_2018.isna(), test.Year_2019,
np.where(test.Year_2019.isna(), test.Year_2020, test.Year_2019)))
Year_2017 Year_2018 Year_2019 Year_2020 Final
0 NaN NaN NaN 1 NaN
1 NaN NaN 2 NaN NaN
2 NaN 3 NaN NaN 3
3 4 NaN NaN NaN NaN
The expected output:
Year_2017 Year_2018 Year_2019 Year_2020 Final
0 NaN NaN NaN 1 1
1 NaN NaN 2 NaN 2
2 NaN 3 NaN NaN 3
3 4 NaN NaN NaN 4
Upvotes: 1
Views: 49
Reputation: 862521
You can forward or back filling missing values and then select last or first column:
test['Final'] = test.ffill(axis=1).iloc[:, -1]
test['Final'] = test.bfill(axis=1).iloc[:, 0]
If there is only one non missing values per rows and numeric use:
test['Final'] = test.min(1)
test['Final'] = test.max(1)
test['Final'] = test.mean(1)
test['Final'] = test.sum(1, min_count=1)
Upvotes: 2
Reputation: 260430
I you only have a single non NA value per row, you can use:
df['Final'] = test.max(axis=1)
(or other aggregators)
Upvotes: 1