Retrieving the the last non None value in a Pandas DataFrame

Question

I've got a a huge DataFrame with a lot of None values in many columns. I wonder if I'm interesting in one specific column, is it easy to get the "last" valid values from the other columns? I tried to setup an easy example:

df = pd.DataFrame([[1, None, None, 123],
                   [2, None, 11, None], 
                   [3, 13, None, None], 
                   [4, None, None, 124],
                   [5, None, 10, None], 
                   [6, None, None, 126]
                  ], 
                  columns=['id', 'value1', 'value2', 'value3']
                 )

Say that it is value3 that is of interest, then I'm looking for the easiest way to get this data:

1, None, None, 123
4, 13, 11, 124
6, 13, 10, 126

Here I get the first row with a valid value3 value and None for value1 and value2. The second row is data from row 2,3 and 4 combined.

Code Different · Accepted Answer

Just call ffill to get the last non-null value of the other columns, then combine it with the non-null values from the column you are interested in:

filled = df.ffill()

# The column you are interested in
col = 'value3'
result = df[[col]].dropna() \
            .join(filled.drop(col, axis=1)) \
            [df.columns.to_list()]

Retrieving the the last non None value in a Pandas DataFrame

Answers (1)

Related Questions