Reputation: 542
I am trying to do the following:
Supposing I have the following column on pandas, where there will be always two values that are equal in sequence.
l = [np.nan, np.nan, 10, np.nan, np.nan, np.nan, 10, np.nan, 4, np.nan, 4, 5, np.nan, 5, np.nan, 2, np.nan, 2, 1, 1]
How can I fill NaN values only in between the interval of similar values ?
expected output:
[np.nan, np.nan, 10, 10, 10, 10, 10, np.nan, 4, 4, 4, 5, 5, 5, np.nan, 2, 2, 2, 1, 1]
I could only find this answer, which is not the same problem:
Upvotes: 6
Views: 582
Reputation: 30012
You can use
m = df['l'].eq(df['l']).cumsum()
out = df.groupby(m).apply(lambda g: g.fillna({'l': g['l'].max()}) if g.name % 2 else g)
Detailed explaination:
Assume you have the dataframe
l = [np.nan, np.nan, 10, np.nan, np.nan, np.nan, 10, np.nan, 4, np.nan, 4, 5, np.nan, 5, np.nan, 2, np.nan, 2, 1, 1, np.nan]
df = pd.DataFrame({'l': l})
print(df)
l
0 NaN
1 NaN
2 10.0
3 NaN
4 NaN
5 NaN
6 10.0
7 NaN
8 4.0
9 NaN
10 4.0
11 5.0
12 NaN
13 5.0
14 NaN
15 2.0
16 NaN
17 2.0
18 1.0
19 1.0
20 NaN
You can use the feature that NaN is not equal with NaN to create a cumsum
.
df['m'] = df['l'].eq(df['l']).cumsum()
print(df)
l m
0 NaN 0
1 NaN 0
2 10.0 1
3 NaN 1
4 NaN 1
5 NaN 1
6 10.0 2
7 NaN 2
8 4.0 3
9 NaN 3
10 4.0 4
11 5.0 5
12 NaN 5
13 5.0 6
14 NaN 6
15 2.0 7
16 NaN 7
17 2.0 8
18 1.0 9
19 1.0 10
20 NaN 10
We can notice that the True
only occurs in pairs. The first line could only be True
or False
, so no matter what the first line is
True
of first True
pair could only be 1
True
of first True
pair could only be 2
This happens to other True
pairs: the cumsum to the start True
is odd number, the cumsum to the end True
is even number.
With this in mind, we can do a groupby then only fill the odd sum value with the not non value in group.
out = df.groupby(m).apply(lambda g: g.fillna({'l': g['l'].max()}) if g.name % 2 else g)
print(out)
l
0 NaN
1 NaN
2 10.0
3 10.0
4 10.0
5 10.0
6 10.0
7 NaN
8 4.0
9 4.0
10 4.0
11 5.0
12 5.0
13 5.0
14 NaN
15 2.0
16 2.0
17 2.0
18 1.0
19 1.0
20 NaN
Upvotes: 1
Reputation: 71689
ffill
and bfill
f = df['col'].ffill()
b = df['col'].bfill()
df['col'].mask(f == b, f)
0 NaN
1 NaN
2 10.0
3 10.0
4 10.0
5 10.0
6 10.0
7 NaN
8 4.0
9 4.0
10 4.0
11 5.0
12 5.0
13 5.0
14 NaN
15 2.0
16 2.0
17 2.0
18 1.0
19 1.0
Name: col, dtype: float64
Upvotes: 7