Waroulolz
Waroulolz

Reputation: 307

Pandas fill nan values with a split of the first available value

I'm trying to replace nan values in a DataFrame with a split of the first previous available value across all the following nan values.

In the example below :

import pandas as pd
df = [100, None, None, 40, None, 120]
df = pd.DataFrame(df)

I would like to get :

[33.33, 33.33, 33.33, 20, 20, 120]

If I could find a way to count the number of nan values following each value in my column, then I could run some computations to achieve the split.

Upvotes: 3

Views: 797

Answers (2)

U13-Forward
U13-Forward

Reputation: 71600

Maybe use an ffill, and get the first column, and divide it by it (the ffilled dataframe) groupbyed and use transform('count') to get the counts and keep the same length for the dataframe:

print(df.ffill()[0] / df.ffill().groupby(0)[0].transform('count'))

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Use:

import pandas as pd
df = [100, None, None, 40, None, 120]
df = pd.DataFrame(df, columns=['a'])

s = df['a'].ffill() / df.groupby(df['a'].notna().cumsum())['a'].transform('size')
print (s)

0     33.333333
1     33.333333
2     33.333333
3     20.000000
4     20.000000
5    120.000000
Name: a, dtype: float64

Details:

You can replace missing value by previous non NaNs values by ffill:

print (df['a'].ffill())
0    100.0
1    100.0
2    100.0
3     40.0
4     40.0
5    120.0
Name: a, dtype: float64

Then compare by Series.notna and create groups by Series.cumsum:

print (df['a'].notna().cumsum())
0    1
1    1
2    1
3    2
4    2
5    3
Name: a, dtype: int32

And get counts per groups with same size like original with GroupBy.transform:

print (df.groupby(df['a'].notna().cumsum())['a'].transform('size'))
0    3
1    3
2    3
3    2
4    2
5    1
Name: a, dtype: int64

Upvotes: 5

Related Questions