NSK
NSK

Reputation: 105

Using pandas.interpolate()

Suppose

test = pd.DataFrame([1,2,3,np.nan,np.nan,np.nan,np.nan,np.nan,4,5,6,np.nan,np.nan,np.nan,np.nan,np.nan,3,4,np.nan])

I would like to apply following command:

test.interpolate(limit = 2, limit_direction  = 'both', limit_area = 'inside')

which returns

           0
0   1.000000
1   2.000000
2   3.000000
3   3.166667
4   3.333333
5        NaN
6   3.666667
7   3.833333
8   4.000000
9   5.000000
10  6.000000
11  5.500000
12  5.000000
13       NaN
14  4.000000
15  3.500000
16  3.000000
17  4.000000
18       NaN

Question: How can i apply a restriction on the minimum number of valid numbers (i.e not NaN) before AND after a group of NaNs, so as to apply the interpolation

In this example, i would like to fill first group of NaNs because there are minimum 3 valid numbers before AND after, but NOT interpolate the second group of NaNs, as there are only two valid numbers after the NaNs (and not 3 as i would prefer)

Expected result:

           0
0   1.000000
1   2.000000
2   3.000000
3   3.166667
4   3.333333
5        NaN
6   3.666667
7   3.833333
8   4.000000
9   5.000000
10  6.000000
11       NaN
12       NaN
13       NaN
14       NaN
15       NaN
16  3.000000
17  4.000000
18       NaN

Upvotes: 2

Views: 184

Answers (1)

FObersteiner
FObersteiner

Reputation: 25544

EDIT 1: revised my first answer. One more go with some sort of mask approach based on this Q&A. EDIT 2: added copy back to pd df using deepcopy to avoid the copy-by-reference issue.

    import numpy as np
    import pandas as pd
    from copy import deepcopy

    a = np.array([1,2,3,np.nan,np.nan,np.nan,np.nan,np.nan,4,5,6,np.nan,np.nan,np.nan,np.nan,np.nan,3,4,np.nan,1])

    df = pd.DataFrame(a)
    # store values for later, to keep information from blocks that are below size limit:
    temp = deepcopy(df[df[0].notnull()]) 

    mask = np.concatenate(([False],np.isfinite(a),[False]))
    idx = np.nonzero(mask[1:] != mask[:-1])[0] # start and stop indices of your blocks of finite numbers
    counts = (np.flatnonzero(mask[1:] < mask[:-1]) - np.flatnonzero(mask[1:] > mask[:-1])) # n finite numbers per block

    sz_limit = 2 # set limit, exclusive in this case
    for i, size in enumerate(counts):
        if size <= sz_limit:
            a[idx[i*2]:idx[i*2+1]] = np.nan

now call the interpolation and write back values from 'too small' blocks:


    a_inter = pd.DataFrame(a).interpolate(limit = 2, limit_direction = 'both', limit_area = 'inside') 
    a_inter.update(other = temp)  

a_inter is then

           0
0   1.000000
1   2.000000
2   3.000000
3   3.166667
4   3.333333
5        NaN
6   3.666667
7   3.833333
8   4.000000
9   5.000000
10  6.000000
11       NaN
12       NaN
13       NaN
14       NaN
15       NaN
16       NaN
17       NaN
18       NaN

to improve this hack, you could put the masking in a function and get rid of the for loop.

Upvotes: 1

Related Questions