lkbu
lkbu

Reputation: 66

Filling gaps based on gap length

I am currently playing with financial data, missing financial data specifically. What I'm trying to do is fill the gaps basing on gap length, for example: - if length of the gap is lower than 5 NaN, then interpolate - if length is > 5 NaN, then fill with values from different series

So what I am trying to accomplish here is a function that will scan series for NaN, get their length and then fill them appropriately. I just wanted to push as much as I can to pandas/numpy ops and not do it in loops etc...

Below just example, this is not optimal at all:

ser = pd.Series(np.sort(np.random.uniform(size=100)))
ser[48:52] = None
ser[10:20] = None

def count(a):
    tmp = 0
    for i in range(len(a)):
        current=a[i]
        if not(np.isnan(current)) and tmp>0:
            a[(i-tmp):i]=tmp
            tmp=0
        if np.isnan(current):
            tmp=tmp+1

g = ser.copy()
count(g)
g[g<1]=0

df = pd.DataFrame(ser, columns=['ser'])
df['group'] = g

Now we want to interpolate when gap is < 10 and put something where gap > 9

df['ready'] = df.loc[df.group<10,['ser']].interpolate(method='linear')
df['ready'] = df.loc[df.group>9,['ser']] = 100

To sum up, 2 questions: - can Pandas do it robust way? - if not, what can you suggest to make my way more robust and faster? Lets just focus on 2 points here: first there is this loop over series - it will take ages once I have, say, 100 series with gaps. Maybe something like Numba? Then, I'm interpolating on copies any suggestions on how to do it inplace?

Thanks for having a look

Upvotes: 2

Views: 2053

Answers (2)

lkbu
lkbu

Reputation: 66

After a lengthy look for an answer it turns out there is no automated way of doing fillna based on gap length.

Conclusion: one can utilize the code from the question, the idea will work.

Upvotes: 0

Bob Haffner
Bob Haffner

Reputation: 8483

You could leverage interpolate's limit parameter.

df['ready'] = df.loc[df.group<10,['ser']].interpolate(method='linear',limit=9)

limit : int, default None. Maximum number of consecutive NaNs to fill.

Then run interpolate() a second time with a different method or even run fillna()

Upvotes: 1

Related Questions