Reputation: 66
I am currently playing with financial data, missing financial data specifically. What I'm trying to do is fill the gaps basing on gap length, for example: - if length of the gap is lower than 5 NaN, then interpolate - if length is > 5 NaN, then fill with values from different series
So what I am trying to accomplish here is a function that will scan series for NaN, get their length and then fill them appropriately. I just wanted to push as much as I can to pandas/numpy ops and not do it in loops etc...
Below just example, this is not optimal at all:
ser = pd.Series(np.sort(np.random.uniform(size=100)))
ser[48:52] = None
ser[10:20] = None
def count(a):
tmp = 0
for i in range(len(a)):
current=a[i]
if not(np.isnan(current)) and tmp>0:
a[(i-tmp):i]=tmp
tmp=0
if np.isnan(current):
tmp=tmp+1
g = ser.copy()
count(g)
g[g<1]=0
df = pd.DataFrame(ser, columns=['ser'])
df['group'] = g
Now we want to interpolate when gap is < 10 and put something where gap > 9
df['ready'] = df.loc[df.group<10,['ser']].interpolate(method='linear')
df['ready'] = df.loc[df.group>9,['ser']] = 100
To sum up, 2 questions: - can Pandas do it robust way? - if not, what can you suggest to make my way more robust and faster? Lets just focus on 2 points here: first there is this loop over series - it will take ages once I have, say, 100 series with gaps. Maybe something like Numba? Then, I'm interpolating on copies any suggestions on how to do it inplace?
Thanks for having a look
Upvotes: 2
Views: 2053
Reputation: 66
After a lengthy look for an answer it turns out there is no automated way of doing fillna based on gap length.
Conclusion: one can utilize the code from the question, the idea will work.
Upvotes: 0
Reputation: 8483
You could leverage interpolate's limit parameter.
df['ready'] = df.loc[df.group<10,['ser']].interpolate(method='linear',limit=9)
limit : int, default None. Maximum number of consecutive NaNs to fill.
Then run interpolate() a second time with a different method or even run fillna()
Upvotes: 1