Reputation: 676
I have a column like this,
valueCount
0.0
nan
2.0
1.0
1.0
1.0
nan
nan
nan
4.0
I want to fillna based on next value(add) or previous value(subtract) available. so the result should be,
valueCount
0.0
**1.0**
2.0
1.0
1.0
1.0
**1.0**
**2.0**
**3.0**
4.0
i know it is very condititonal based, if my previous value is 0 i can add +1 to the nan row, else i should start adding from 0,1,2 and so on.
i can do this algorithm in simple list of python, but in pandas is there any simple way ?
Upvotes: 1
Views: 2963
Reputation: 11261
In this case you could also use interpolation and rounding down. But this will not work for all data sets.
Consider:
import pandas as pd
import numpy as np
s = pd.Series([0, np.nan, 2, 1, 1, 1, np.nan, np.nan, np.nan, 4])
Then np.floor(s.interpolate())
gives
0 0.0
1 1.0
2 2.0
3 1.0
4 1.0
5 1.0
6 1.0
7 2.0
8 3.0
9 4.0
dtype: float64
Upvotes: 1
Reputation: 862541
You can use:
a = df['valueCount'].isnull()
b = a.cumsum()
c = df['valueCount'].bfill()
d = c + (b-b.mask(a).bfill().fillna(0).astype(int)).sub(1)
df['valueCount'] = df['valueCount'].fillna(d)
print (df)
valueCount
0 0.0
1 1.0
2 2.0
3 1.0
4 1.0
5 1.0
6 1.0
7 2.0
8 3.0
9 4.0
Detail + explanation:
#back filling NaN values
x = df['valueCount'].bfill()
#compare by NaNs
a = df['valueCount'].isnull()
#cumulative sum of mask
b = a.cumsum()
#replace Trues to NaNs
c = b.mask(a)
#forward fill NaNs
d = b.mask(a).bfill()
#First NaNs to 0 and cast to integers
e = b.mask(a).bfill().fillna(0).astype(int)
#add to backfilled Series cumulative sum and subtract from cumulative sum Series, 1
f = x + b - e - 1
#replace NaNs by Series f
g = df['valueCount'].fillna(f)
df = pd.concat([df['valueCount'], x, a, b, c, d, e, f, g], axis=1,
keys=('orig','x','a','b','c','d','e', 'f', 'g'))
print (df)
orig x a b c d e f g
0 0.0 0.0 False 0 0.0 0.0 0 -1.0 0.0
1 NaN 2.0 True 1 NaN 1.0 1 1.0 1.0
2 2.0 2.0 False 1 1.0 1.0 1 1.0 2.0
3 1.0 1.0 False 1 1.0 1.0 1 0.0 1.0
4 1.0 1.0 False 1 1.0 1.0 1 0.0 1.0
5 1.0 1.0 False 1 1.0 1.0 1 0.0 1.0
6 NaN 4.0 True 2 NaN 4.0 4 1.0 1.0
7 NaN 4.0 True 3 NaN 4.0 4 2.0 2.0
8 NaN 4.0 True 4 NaN 4.0 4 3.0 3.0
9 4.0 4.0 False 4 4.0 4.0 4 3.0 4.0
Upvotes: 6