Reputation: 75
I've retrieved stock movements for several items from an inventory management system:
index itemid date sold received balance stock_level
0 123456 30.03.18 -1 0 -1 3
1 123456 04.04.18 -1 0 -1 2
2 123456 09.04.18 0 1 1 3
3 123457 01.04.18 0 1 1 3
4 123457 03.04.18 -1 0 -1 2
5 123457 11.04.18 0 1 1 3
In order to perform proper analysis, I've created a continuous date sequence
index itemid date sold received balance stock_level
0 123456 28.03.18 0 0 0 nan
1 123456 29.03.18 0 0 0 nan
2 123456 30.03.18 -1 0 -1 3
3 123456 31.03.18 0 0 0 nan
4 123456 01.04.18 0 0 0 nan
5 123456 02.04.18 0 0 0 nan
6 123456 03.04.18 0 0 0 nan
7 123456 04.04.18 -1 0 -1 2
8 123456 05.04.18 0 0 0 nan
9 123456 06.04.18 0 0 0 nan
10 123456 07.04.18 0 0 0 nan
11 123456 08.04.18 0 0 0 nan
12 123456 09.04.18 0 1 1 3
13 123456 10.04.18 0 0 0 nan
14 123456 11.04.18 0 0 0 nan
15 123457 28.03.18 0 0 0 nan
16 123457 29.03.18 0 0 0 nan
17 123457 30.03.18 0 0 0 nan
18 123457 31.03.18 0 0 0 nan
19 123457 01.04.18 0 1 1 3
20 123457 02.04.18 0 0 0 nan
21 123457 03.04.18 -1 0 -1 2
22 123457 04.04.18 0 0 0 nan
23 123457 05.04.18 0 0 0 nan
[...]
28 123457 11.04.18 0 1 1 3
Now, I need to fill the nan values in the stock column for each item groupby(itemid)
. I can use ffill for each group to fill the values as of the 3rd row, but need to bfill for each group based on a function of (the first non-nan stock value) - (balance value at that index).
E.g. bfill of index 0 should be (stock_level at index 2) - (balance at index 2).
That is to say, the outcome I'm looking for is
index itemid date sold received balance stock_level
0 123456 28.03.18 0 0 0 4
1 123456 29.03.18 0 0 0 4
2 123456 30.03.18 -1 0 -1 3
3 123456 31.03.18 0 0 0 3
4 123456 01.04.18 0 0 0 3
5 123456 02.04.18 0 0 0 3
6 123456 03.04.18 0 0 0 3
7 123456 04.04.18 -1 0 -1 2
8 123456 05.04.18 0 0 0 2
9 123456 06.04.18 0 0 0 2
10 123456 07.04.18 0 0 0 2
11 123456 08.04.18 0 0 0 2
12 123456 09.04.18 0 1 1 3
13 123456 10.04.18 0 0 0 3
14 123456 11.04.18 0 0 0 3
15 123457 28.03.18 0 0 0 2
16 123457 29.03.18 0 0 0 2
17 123457 30.03.18 0 0 0 2
18 123457 31.03.18 0 0 0 2
19 123457 01.04.18 0 1 1 3
20 123457 02.04.18 0 0 0 3
21 123457 03.04.18 -1 0 -1 2
22 123457 04.04.18 0 0 0 2
23 123457 05.04.18 0 0 0 2
[...]
28 123457 11.04.18 0 1 1 3
How can I backfill for each item group based on a function at the index of the first non-nan value?
Upvotes: 1
Views: 1159
Reputation: 402603
After ffill
, the only NaNs that remain are those at the start.
You'll need to use pd.Series.first_valid_index
here, along with a combination of ffill
and fillna
.
i = df.stock_level.first_valid_index()
df.stock_level.ffill().fillna(df.stock_level[i] - df.balance[i])
0 4.0
1 4.0
2 3.0
3 3.0
4 3.0
5 3.0
6 3.0
7 2.0
8 2.0
9 2.0
10 2.0
11 2.0
12 3.0
13 3.0
14 3.0
Name: stock_level, dtype: float64
Upvotes: 2