Reputation: 57
Suppose a generic dataframe with 4 numeric columns and one final column that, row-wise, gathers all observations of the past four columns inside a list.
Let me provide a code example of the initial dataframe:
import pandas as pd
import numpy as np
x = pd.DataFrame({'col_1': [np.nan, 35, 27, 50],
'col_2': [15,12,np.nan, np.nan],
'col_3': [12,15,40,np.nan],
'col_4': [np.nan,np.nan,np.nan,5],
})
col_names = x.filter(regex='col', axis='columns').columns.tolist()
x['fifth_col'] = x[col_names].values.tolist()
col_1 col_2 col_3 col_4 fifth_col
0 NaN 15.0 12.0 NaN [nan, 15.0, 12.0, nan]
1 35.0 12.0 15.0 NaN [35.0, 12.0, 15.0, nan]
2 27.0 NaN 40.0 NaN [27.0, nan, 40.0, nan]
3 50.0 NaN NaN 5.0 [50.0, nan, nan, 5.0]
I need to create a sixth column that shows the first non-null element in every list of the fifth column. I tried with the following statement but it does not work:
x['sixth_col'] = x['fifth_col'].notna().apply(lambda x: x.first)
Upvotes: 1
Views: 84
Reputation: 6333
You can do this directly, without the need for fifth_colum
. Just stack
the data frame. Since you want the first non-null element per row, your group is the first index (level=0
). So just get the first value by that group.
x['sixth_col'] = x.stack().groupby(level=0).first()
col_1 col_2 col_3 col_4 sixth_col
0 NaN 15.0 12.0 NaN 15.0
1 35.0 12.0 15.0 NaN 35.0
2 27.0 NaN 40.0 NaN 27.0
3 50.0 NaN NaN 5.0 50.0
Upvotes: 1
Reputation: 59579
explode
the list then take the first
value along the index.
x['sixth_col'] = x['fifth_col'].explode().groupby(level=0).first()
col_1 col_2 col_3 col_4 fifth_col sixth_col
0 NaN 15.0 12.0 NaN [nan, 15.0, 12.0, nan] 15.0
1 35.0 12.0 15.0 NaN [35.0, 12.0, 15.0, nan] 35.0
2 27.0 NaN 40.0 NaN [27.0, nan, 40.0, nan] 27.0
3 50.0 NaN NaN 5.0 [50.0, nan, nan, 5.0] 50.0
Upvotes: 1