Reputation: 634
Task
I have the following df
and want to forward fill the NaN by each id
with the previous value. On the last value in each column, I want it to only forward fill 2 more times.
I currently use this:
df.set_index(['id', 'date'], inplace=True)
df = df.ffill().where(df.bfill().notnull())
However, it forward fills more than the desired 2 times. Any help would be super!
df
id date compname fyr
1 1 2016-02-29 a 5.0
2 1 2016-03-31 NaN NaN
3 1 2016-04-30 NaN NaN
4 1 2016-05-31 a 5.0
5 1 2016-06-30 NaN NaN
6 1 2016-07-31 NaN NaN
7 1 2016-08-31 a 5.0
8 1 2016-09-30 NaN NaN
9 1 2016-10-31 NaN NaN
10 1 2016-11-30 a 5.0
11 1 2016-12-31 NaN NaN
12 1 2017-01-31 NaN NaN
13 1 2017-02-28 NaN NaN
14 1 2017-03-31 NaN NaN
15 2 2016-02-29 b 5.0
16 2 2016-03-31 NaN NaN
17 2 2016-04-30 NaN NaN
18 2 2016-05-31 b 5.0
19 2 2016-06-30 NaN NaN
20 2 2016-07-31 NaN NaN
21 2 2016-08-31 b 5.0
22 2 2016-09-30 NaN NaN
23 2 2016-10-31 NaN NaN
24 2 2016-11-30 b 5.0
25 2 2016-12-31 NaN NaN
26 2 2017-01-31 NaN NaN
27 2 2017-02-28 NaN NaN
28 2 2017-03-31 NaN NaN
desired df
id date compname fyr
1 1 2016-02-29 a 5.0
2 1 2016-03-31 a 5.0
3 1 2016-04-30 a 5.0
4 1 2016-05-31 a 5.0
5 1 2016-06-30 a 5.0
6 1 2016-07-31 a 5.0
7 1 2016-08-31 a 5.0
8 1 2016-09-30 a 5.0
9 1 2016-10-31 a 5.0
10 1 2016-11-30 a 5.0
11 1 2016-12-31 a 5.0
12 1 2017-01-31 a 5.0
13 1 2017-02-28 NaN NaN
14 1 2017-03-31 NaN NaN
15 2 2016-02-29 b 5.0
16 2 2016-03-31 b 5.0
17 2 2016-04-30 b 5.0
18 2 2016-05-31 b 5.0
19 2 2016-06-30 b 5.0
20 2 2016-07-31 b 5.0
21 2 2016-08-31 b 5.0
22 2 2016-09-30 b 5.0
23 2 2016-10-31 b 5.0
24 2 2016-11-30 b 5.0
25 2 2016-12-31 b 5.0
26 2 2017-01-31 b 5.0
27 2 2017-02-28 NaN NaN
28 2 2017-03-31 NaN NaN
Upvotes: 1
Views: 38
Reputation: 862511
If need back and forward filling per groups only 2 values add parameter limit
to bfill
and ffill
methods:
f = lambda x: x.bfill(limit=2).ffill(limit=2)
df[["compname","fyr"]] = df.groupby("id")[["compname","fyr"]].apply(f)
print (df)
id date compname fyr
1 1 2016-02-29 a 5.0
2 1 2016-03-31 a 5.0
3 1 2016-04-30 a 5.0
4 1 2016-05-31 a 5.0
5 1 2016-06-30 a 5.0
6 1 2016-07-31 a 5.0
7 1 2016-08-31 a 5.0
8 1 2016-09-30 a 5.0
9 1 2016-10-31 a 5.0
10 1 2016-11-30 a 5.0
11 1 2016-12-31 a 5.0
12 1 2017-01-31 a 5.0
13 1 2017-02-28 NaN NaN
14 1 2017-03-31 NaN NaN
15 2 2016-02-29 b 5.0
16 2 2016-03-31 b 5.0
17 2 2016-04-30 b 5.0
18 2 2016-05-31 b 5.0
19 2 2016-06-30 b 5.0
20 2 2016-07-31 b 5.0
21 2 2016-08-31 b 5.0
22 2 2016-09-30 b 5.0
23 2 2016-10-31 b 5.0
24 2 2016-11-30 b 5.0
25 2 2016-12-31 b 5.0
26 2 2017-01-31 b 5.0
27 2 2017-02-28 NaN NaN
28 2 2017-03-31 NaN NaN
Upvotes: 1
Reputation: 22493
One way is to first do a groupby
and bfill
, then another groupby
with fillna
:
df[["compname","fyr"]] = df.groupby("id")["compname","fyr"].bfill()
print (df.groupby("id").apply(lambda d: d.fillna(d.shift(2))))
id date compname fyr
1 1 2016-02-29 a 5.0
2 1 2016-03-31 a 5.0
3 1 2016-04-30 a 5.0
4 1 2016-05-31 a 5.0
5 1 2016-06-30 a 5.0
6 1 2016-07-31 a 5.0
7 1 2016-08-31 a 5.0
8 1 2016-09-30 a 5.0
9 1 2016-10-31 a 5.0
10 1 2016-11-30 a 5.0
11 1 2016-12-31 a 5.0
12 1 2017-01-31 a 5.0
13 1 2017-02-28 NaN NaN
14 1 2017-03-31 NaN NaN
15 2 2016-02-29 b 5.0
16 2 2016-03-31 b 5.0
17 2 2016-04-30 b 5.0
18 2 2016-05-31 b 5.0
19 2 2016-06-30 b 5.0
20 2 2016-07-31 b 5.0
21 2 2016-08-31 b 5.0
22 2 2016-09-30 b 5.0
23 2 2016-10-31 b 5.0
24 2 2016-11-30 b 5.0
25 2 2016-12-31 b 5.0
26 2 2017-01-31 b 5.0
27 2 2017-02-28 NaN NaN
28 2 2017-03-31 NaN NaN
Upvotes: 1