Reputation: 118
I have a large dataframe that looks similar to this:
As you can tell, there are plenty of blanks. I want to propagate non-null values forward (so for example, in the first row 1029 goes to 1963.02.12 column, between 1029 and 1043) but only up to the last entry, that is it should stop propagating when it encounters the last non-null value (for D it would be the 1992.03.23 column, but for A it'd be 1963.09.21, just outside the screenshot).
Is there a quicker way to achieve this without fiddling around with df.fillna(method='ffill', limit=x)
? My original idea was to remember the date of the last entry, propagate values to the end of the row, and then fill the row with nulls after the saved date. I've been wondering if there is a cleverer method to achieve the same result.
Upvotes: 2
Views: 580
Reputation: 12515
This might not be very performant. I couldn't get a pure-pandas solution (which obviously doesn't guarantee performance anyway!)
>>> df
a b c d e
0 0.0 NaN NaN 1.0 NaN
1 0.0 1.0 NaN 2.0 3.0
2 NaN 1.0 2.0 NaN 4.0
What happens if we just ffill
everything?
>>> df.ffill(axis=1)
a b c d e
0 0.0 0.0 0.0 1.0 1.0
1 0.0 1.0 1.0 2.0 3.0
2 NaN 1.0 2.0 2.0 4.0
We need to go back and add NaNs for the last null column in each row:
>>> new_data = []
>>> for _, row in df.iterrows():
... new_row = row.ffill()
... null_columns = [col for col, is_null in zip(row.index, row.isnull().values) if is_null]
... # replace value in last column with NaN
... if null_columns:
... last_null_column = null_columns[-1]
... new_row.ix[last_null_column] = np.nan
... new_data.append(new_row.to_dict())
...
>>> new_df = pd.DataFrame.from_records(new_data)
>>> new_df
a b c d e
0 0.0 0.0 0.0 1.0 NaN
1 0.0 1.0 NaN 2.0 3.0
2 NaN 1.0 2.0 NaN 4.0
Upvotes: 1