Mark Hallbauer
Mark Hallbauer

Reputation: 75

Python Pandas: fillna / bfill using function

I've retrieved stock movements for several items from an inventory management system:

index   itemid    date      sold    received    balance  stock_level
0       123456    30.03.18  -1      0           -1       3
1       123456    04.04.18  -1      0           -1       2
2       123456    09.04.18   0      1            1       3
3       123457    01.04.18   0      1            1       3
4       123457    03.04.18  -1      0           -1       2
5       123457    11.04.18   0      1            1       3

In order to perform proper analysis, I've created a continuous date sequence

index   itemid    date    sold  received   balance  stock_level
0       123456    28.03.18  0   0           0       nan
1       123456    29.03.18  0   0           0       nan
2       123456    30.03.18  -1  0           -1      3
3       123456    31.03.18  0   0           0       nan
4       123456    01.04.18  0   0           0       nan
5       123456    02.04.18  0   0           0       nan
6       123456    03.04.18  0   0           0       nan
7       123456    04.04.18  -1  0           -1      2
8       123456    05.04.18  0   0           0       nan
9       123456    06.04.18  0   0           0       nan
10      123456    07.04.18  0   0           0       nan
11      123456    08.04.18  0   0           0       nan
12      123456    09.04.18  0   1           1       3
13      123456    10.04.18  0   0           0       nan
14      123456    11.04.18  0   0           0       nan
15      123457    28.03.18  0   0           0       nan
16      123457    29.03.18  0   0           0       nan
17      123457    30.03.18  0   0           0       nan
18      123457    31.03.18  0   0           0       nan
19      123457    01.04.18  0   1           1       3
20      123457    02.04.18  0   0           0       nan
21      123457    03.04.18  -1  0           -1      2
22      123457    04.04.18  0   0           0       nan
23      123457    05.04.18  0   0           0       nan
[...]
28      123457    11.04.18  0   1           1       3

Now, I need to fill the nan values in the stock column for each item groupby(itemid). I can use ffill for each group to fill the values as of the 3rd row, but need to bfill for each group based on a function of (the first non-nan stock value) - (balance value at that index).

E.g. bfill of index 0 should be (stock_level at index 2) - (balance at index 2).

That is to say, the outcome I'm looking for is

index   itemid    date    sold  received   balance  stock_level
0       123456    28.03.18  0   0           0       4
1       123456    29.03.18  0   0           0       4
2       123456    30.03.18  -1  0           -1      3
3       123456    31.03.18  0   0           0       3
4       123456    01.04.18  0   0           0       3
5       123456    02.04.18  0   0           0       3
6       123456    03.04.18  0   0           0       3
7       123456    04.04.18  -1  0           -1      2
8       123456    05.04.18  0   0           0       2
9       123456    06.04.18  0   0           0       2
10      123456    07.04.18  0   0           0       2
11      123456    08.04.18  0   0           0       2
12      123456    09.04.18  0   1           1       3
13      123456    10.04.18  0   0           0       3
14      123456    11.04.18  0   0           0       3
15      123457    28.03.18  0   0           0       2
16      123457    29.03.18  0   0           0       2
17      123457    30.03.18  0   0           0       2
18      123457    31.03.18  0   0           0       2
19      123457    01.04.18  0   1           1       3
20      123457    02.04.18  0   0           0       3
21      123457    03.04.18  -1  0           -1      2
22      123457    04.04.18  0   0           0       2
23      123457    05.04.18  0   0           0       2
[...]
28      123457    11.04.18  0   1           1       3

How can I backfill for each item group based on a function at the index of the first non-nan value?

Upvotes: 1

Views: 1159

Answers (1)

cs95
cs95

Reputation: 402603

After ffill, the only NaNs that remain are those at the start.
You'll need to use pd.Series.first_valid_index here, along with a combination of ffill and fillna.

i = df.stock_level.first_valid_index()
df.stock_level.ffill().fillna(df.stock_level[i] - df.balance[i])

0     4.0
1     4.0
2     3.0
3     3.0
4     3.0
5     3.0
6     3.0
7     2.0
8     2.0
9     2.0
10    2.0
11    2.0
12    3.0
13    3.0
14    3.0
Name: stock_level, dtype: float64

Upvotes: 2

Related Questions