Reputation: 307
I'm trying to replace nan values in a DataFrame with a split of the first previous available value across all the following nan values.
In the example below :
import pandas as pd
df = [100, None, None, 40, None, 120]
df = pd.DataFrame(df)
I would like to get :
[33.33, 33.33, 33.33, 20, 20, 120]
If I could find a way to count the number of nan values following each value in my column, then I could run some computations to achieve the split.
Upvotes: 3
Views: 797
Reputation: 71600
Maybe use an ffill
, and get the first column, and divide it by it (the ffill
ed dataframe) groupby
ed and use transform('count')
to get the counts and keep the same length for the dataframe:
print(df.ffill()[0] / df.ffill().groupby(0)[0].transform('count'))
Upvotes: 0
Reputation: 863166
Use:
import pandas as pd
df = [100, None, None, 40, None, 120]
df = pd.DataFrame(df, columns=['a'])
s = df['a'].ffill() / df.groupby(df['a'].notna().cumsum())['a'].transform('size')
print (s)
0 33.333333
1 33.333333
2 33.333333
3 20.000000
4 20.000000
5 120.000000
Name: a, dtype: float64
Details:
You can replace missing value by previous non NaN
s values by ffill
:
print (df['a'].ffill())
0 100.0
1 100.0
2 100.0
3 40.0
4 40.0
5 120.0
Name: a, dtype: float64
Then compare by Series.notna
and create groups by Series.cumsum
:
print (df['a'].notna().cumsum())
0 1
1 1
2 1
3 2
4 2
5 3
Name: a, dtype: int32
And get counts per groups with same size like original with GroupBy.transform
:
print (df.groupby(df['a'].notna().cumsum())['a'].transform('size'))
0 3
1 3
2 3
3 2
4 2
5 1
Name: a, dtype: int64
Upvotes: 5