Reputation: 725
I have the following dataframe and I want to fill in the forward NaN cells with the last available value divided by the number of the NaN+1:
Update: I would like to iterate to the last x columns and do that manipulations, the columns.values
is a tuple.( ie it has multiple rows).
It should be similar to the DataFrame.fillna(value=None, method=ffill)
, but not with the last available value though.
The final dataframe should look like that:
Thanks for your help in advance.
Upvotes: 2
Views: 995
Reputation: 164633
You can use GroupBy
+ transform
with 'first'
and 'size'
. This is similar to @sacul's solution, but avoids the lambda
which limits the benefits of vectorisation:
g = df.groupby(df['B'].notnull().cumsum())['B']
df['B'] = g.transform('first') / g.transform('size')
There's a 1,500x performance differential for a large dataframe:
# Python 3.6.0, Pandas 0.19.2
np.random.seed(0)
df = pd.DataFrame({'A':np.random.random(10),'B':[8,np.nan,np.nan,np.nan,np.nan,7,np.nan,np.nan,np.nan,np.nan]})
def jpp(df):
g = df.groupby(df['B'].notnull().cumsum())['B']
df['B'] = g.transform('first') / g.transform('size')
return df
def sac(df):
df['B'] = df.groupby(df.B.notnull().cumsum(),as_index=False)['B']\
.transform(lambda x: x.iloc[0]/x.size)
return df
df = pd.concat([df]*1000)
assert jpp(df).equals(sac(df))
%timeit jpp(df) # 5.07 ms per loop
%timeit sac(df) # 7.84 s per loop
Upvotes: 3
Reputation: 51335
You could use a groupby instead of ffill
: use the cumulative sum of the boolean resulting from notnull
as your grouping variable, then transform B
by taking the first value and dividing it by the size of each group:
df = pd.DataFrame({'A':np.random.random(10),'B':[8,np.nan,np.nan,np.nan,np.nan,7,np.nan,np.nan,np.nan,np.nan]})
>>> df
A B
0 0.899200 8.0
1 0.011443 NaN
2 0.227406 NaN
3 0.602941 NaN
4 0.214716 NaN
5 0.534166 7.0
6 0.519983 NaN
7 0.273051 NaN
8 0.454338 NaN
9 0.537210 NaN
df['B'] = (df.groupby(df.B.notnull().cumsum(),as_index=False)['B']
.transform(lambda x: x.iloc[0]/x.size))
>>> df
A B
0 0.899200 1.6
1 0.011443 1.6
2 0.227406 1.6
3 0.602941 1.6
4 0.214716 1.6
5 0.534166 1.4
6 0.519983 1.4
7 0.273051 1.4
8 0.454338 1.4
9 0.537210 1.4
Upvotes: 1