How to take the first non null element, row-wise, from a column that consists of lists?

Question

Suppose a generic dataframe with 4 numeric columns and one final column that, row-wise, gathers all observations of the past four columns inside a list.

Let me provide a code example of the initial dataframe:

import pandas as pd
import numpy as np

x = pd.DataFrame({'col_1': [np.nan, 35, 27, 50],
                  'col_2': [15,12,np.nan, np.nan],
                  'col_3': [12,15,40,np.nan],
                  'col_4': [np.nan,np.nan,np.nan,5],
                  })
col_names = x.filter(regex='col', axis='columns').columns.tolist()
x['fifth_col'] = x[col_names].values.tolist()

   col_1  col_2  col_3  col_4                fifth_col
0    NaN   15.0   12.0    NaN   [nan, 15.0, 12.0, nan]
1   35.0   12.0   15.0    NaN  [35.0, 12.0, 15.0, nan]
2   27.0    NaN   40.0    NaN   [27.0, nan, 40.0, nan]
3   50.0    NaN    NaN    5.0    [50.0, nan, nan, 5.0]

I need to create a sixth column that shows the first non-null element in every list of the fifth column. I tried with the following statement but it does not work:

x['sixth_col'] = x['fifth_col'].notna().apply(lambda x: x.first)

Arturo Sbr · Accepted Answer

You can do this directly, without the need for fifth_colum. Just stack the data frame. Since you want the first non-null element per row, your group is the first index (level=0). So just get the first value by that group.

x['sixth_col'] = x.stack().groupby(level=0).first()

   col_1  col_2  col_3  col_4  sixth_col
0    NaN   15.0   12.0    NaN       15.0
1   35.0   12.0   15.0    NaN       35.0
2   27.0    NaN   40.0    NaN       27.0
3   50.0    NaN    NaN    5.0       50.0

How to take the first non null element, row-wise, from a column that consists of lists?

Answers (2)

Related Questions