Reputation: 105
Suppose
test = pd.DataFrame([1,2,3,np.nan,np.nan,np.nan,np.nan,np.nan,4,5,6,np.nan,np.nan,np.nan,np.nan,np.nan,3,4,np.nan])
I would like to apply following command:
test.interpolate(limit = 2, limit_direction = 'both', limit_area = 'inside')
which returns
0
0 1.000000
1 2.000000
2 3.000000
3 3.166667
4 3.333333
5 NaN
6 3.666667
7 3.833333
8 4.000000
9 5.000000
10 6.000000
11 5.500000
12 5.000000
13 NaN
14 4.000000
15 3.500000
16 3.000000
17 4.000000
18 NaN
Question: How can i apply a restriction on the minimum number of valid numbers (i.e not NaN) before AND after a group of NaNs, so as to apply the interpolation
In this example, i would like to fill first group of NaNs because there are minimum 3 valid numbers before AND after, but NOT interpolate the second group of NaNs, as there are only two valid numbers after the NaNs (and not 3 as i would prefer)
Expected result:
0
0 1.000000
1 2.000000
2 3.000000
3 3.166667
4 3.333333
5 NaN
6 3.666667
7 3.833333
8 4.000000
9 5.000000
10 6.000000
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 3.000000
17 4.000000
18 NaN
Upvotes: 2
Views: 184
Reputation: 25544
EDIT 1: revised my first answer. One more go with some sort of mask approach based on this Q&A.
EDIT 2: added copy back to pd df using deepcopy
to avoid the copy-by-reference issue.
import numpy as np
import pandas as pd
from copy import deepcopy
a = np.array([1,2,3,np.nan,np.nan,np.nan,np.nan,np.nan,4,5,6,np.nan,np.nan,np.nan,np.nan,np.nan,3,4,np.nan,1])
df = pd.DataFrame(a)
# store values for later, to keep information from blocks that are below size limit:
temp = deepcopy(df[df[0].notnull()])
mask = np.concatenate(([False],np.isfinite(a),[False]))
idx = np.nonzero(mask[1:] != mask[:-1])[0] # start and stop indices of your blocks of finite numbers
counts = (np.flatnonzero(mask[1:] < mask[:-1]) - np.flatnonzero(mask[1:] > mask[:-1])) # n finite numbers per block
sz_limit = 2 # set limit, exclusive in this case
for i, size in enumerate(counts):
if size <= sz_limit:
a[idx[i*2]:idx[i*2+1]] = np.nan
now call the interpolation and write back values from 'too small' blocks:
a_inter = pd.DataFrame(a).interpolate(limit = 2, limit_direction = 'both', limit_area = 'inside')
a_inter.update(other = temp)
a_inter
is then
0
0 1.000000
1 2.000000
2 3.000000
3 3.166667
4 3.333333
5 NaN
6 3.666667
7 3.833333
8 4.000000
9 5.000000
10 6.000000
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 NaN
to improve this hack, you could put the masking in a function and get rid of the for loop.
Upvotes: 1