Reputation: 449
I've got a a huge DataFrame with a lot of None values in many columns. I wonder if I'm interesting in one specific column, is it easy to get the "last" valid values from the other columns? I tried to setup an easy example:
df = pd.DataFrame([[1, None, None, 123],
[2, None, 11, None],
[3, 13, None, None],
[4, None, None, 124],
[5, None, 10, None],
[6, None, None, 126]
],
columns=['id', 'value1', 'value2', 'value3']
)
Say that it is value3
that is of interest, then I'm looking for the easiest way to get this data:
1, None, None, 123
4, 13, 11, 124
6, 13, 10, 126
Here I get the first row with a valid value3
value and None for value1
and value2
. The second row is data from row 2,3 and 4 combined.
Upvotes: 2
Views: 789
Reputation: 93151
Just call ffill
to get the last non-null value of the other columns, then combine it with the non-null values from the column you are interested in:
filled = df.ffill()
# The column you are interested in
col = 'value3'
result = df[[col]].dropna() \
.join(filled.drop(col, axis=1)) \
[df.columns.to_list()]
Upvotes: 1