Reputation: 141
I have an array with missing values in various places.
import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
0 NaN
1 NaN
2 3.0
3 4.0
4 5.0
5 6.0
6 NaN
7 8.0
8 9.0
dtype: float64
For each NaN
, I want to take the value proceeding it, an divide it by two. And then propogate that to the next consecutive NaN
, so I would end up with:
0 0.75
1 1.5
2 3.0
3 4.0
4 5.0
5 6.0
6 4.0
7 8.0
8 9.0
dtype: float64
I've tried df.interpolate()
, but that doesn't seem to work with consecutive NaN's.
Upvotes: 3
Views: 1496
Reputation: 862901
Another solution with fillna
with method ffill
, what it same as ffill()
function:
#back order of Series
b = df[::-1].isnull()
#find all consecutives NaN, count them, divide by 2 and replace 0 to 1
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
print(a)
8 1
7 1
6 2
5 1
4 1
3 1
2 1
1 2
0 4
dtype: int32
print(df.bfill().div(a))
0 0.75
1 1.50
2 3.00
3 4.00
4 5.00
5 6.00
6 4.00
7 8.00
8 9.00
dtype: float64
Timings (len(df)=9k
):
In [315]: %timeit (mat(df))
100 loops, best of 3: 11.3 ms per loop
In [316]: %timeit (jez(df1))
100 loops, best of 3: 2.52 ms per loop
Code for timings:
import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()
def jez(df):
b = df[::-1].isnull()
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
return (df.bfill().div(a))
def mat(df):
prev = 0
new_list = []
for i in df.values[::-1]:
if np.isnan(i):
new_list.append(prev/2.)
prev = prev / 2.
else:
new_list.append(i)
prev = i
return pd.Series(new_list[::-1])
print (mat(df))
print (jez(df1))
Upvotes: 3
Reputation: 6658
You can do something like this:
import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
prev = 0
new_list = []
for i in df.values[::-1]:
if np.isnan(i):
new_list.append(prev/2.)
prev = prev / 2.
else:
new_list.append(i)
prev = i
df = pd.Series(new_list[::-1])
It loops over the values of the df, in reverse. It keeps track of the previous value. It adds the actual value if it is not NaN, otherwise the half of the previous value.
This might not be the most sophisticated Pandas solution, but you can change the behavior quite easy.
Upvotes: 2