How to compute a column depending on previous values of one and current values of another column

Question

I have not too much experience with pandas, and I have the following DataFrame:

month       A               B
2/28/2017   0.7377573034    0
3/31/2017   0.7594787565    3.7973937824
4/30/2017   0.7508308808    3.7541544041
5/31/2017   0.7038814004    7.0388140044
6/30/2017   0.6920212254    11.0723396061
7/31/2017   0.6801610503    11.5627378556
8/31/2017   0.6683008753    10.6928140044
9/30/2017   0.7075915026    11.3214640415
10/31/2017  0.6989436269    7.6883798964
11/30/2017  0.6259514607    4.3816602247
12/31/2017  0.6119757303    3.671854382
1/31/2018   0.633           3.798
2/28/2018   0.598           4.784
3/31/2018   0.673           5.384
4/30/2018   0.673           1.346
5/31/2018   0.609           0
6/30/2018   0.609           0
7/31/2018   0.609           0
8/31/2018   0.609           0
9/30/2018   0.673           0
10/31/2018  0.673           0
11/30/2018  0.598           0
12/31/2018  0.598           0

I need to compute column C which basically is column A times column B, but the value of column B is the value of the previous year of the corresponding month. In addition, for values not having the corresponding month in the previous year, this value should be zero. To be more specific, this is what I expect C to be:

C
0 # these values are zero because the corresponding month in the previous year is not in column A
0
0
0
0
0
0
0
0
0
0
0
0               # 0.598 * 0
2.5556460155552 # 0.673 * 3.7973937824
2.5265459139593 # 0.673 * 3.7541544041
4.2866377286796 # 0.609 * 7.0388140044
6.7430548201149 # 0.609 * 11.0723396061
7.0417073540604 # 0.609 * 11.5627378556
6.5119237286796 # 0.609 * 10.6928140044
7.6193452999295 # 0.673 * 11.3214640415
5.1742796702772 # 0.673 * 7.6883798964
2.6202328143706 # 0.598 * 4.3816602247
2.195768920436  # 0.598 * 3.671854382

How can I achieve this? I am sure there might be a way to do it not using a for loop. Thanks in advance.

MaxU - stand with Ukraine · Accepted Answer

In [73]: (df.drop('B',1)
    ...:   .merge(df.drop('A',1)
    ...:            .assign(month=df.month + pd.offsets.MonthEnd(12)),
    ...:          on='month', how='left')
    ...:   .eval("C = A * B", inplace=False)
    ...:   .fillna(0)
    ...: )
    ...:
Out[73]:
        month         A          B         C
0  2017-02-28  0.737757   0.000000  0.000000
1  2017-03-31  0.759479   0.000000  0.000000
2  2017-04-30  0.750831   0.000000  0.000000
3  2017-05-31  0.703881   0.000000  0.000000
4  2017-06-30  0.692021   0.000000  0.000000
5  2017-07-31  0.680161   0.000000  0.000000
6  2017-08-31  0.668301   0.000000  0.000000
7  2017-09-30  0.707592   0.000000  0.000000
8  2017-10-31  0.698944   0.000000  0.000000
9  2017-11-30  0.625951   0.000000  0.000000
10 2017-12-31  0.611976   0.000000  0.000000
11 2018-01-31  0.633000   0.000000  0.000000
12 2018-02-28  0.598000   0.000000  0.000000
13 2018-03-31  0.673000   3.797394  2.555646
14 2018-04-30  0.673000   3.754154  2.526546
15 2018-05-31  0.609000   7.038814  4.286638
16 2018-06-30  0.609000  11.072340  6.743055
17 2018-07-31  0.609000  11.562738  7.041707
18 2018-08-31  0.609000  10.692814  6.511924
19 2018-09-30  0.673000  11.321464  7.619345
20 2018-10-31  0.673000   7.688380  5.174280
21 2018-11-30  0.598000   4.381660  2.620233
22 2018-12-31  0.598000   3.671854  2.195769

Explanation:

we can generate a helper DF like this (we have added 12 months to month column and dropped A column):

In [77]: df.drop('A',1).assign(month=df.month + pd.offsets.MonthEnd(12))
Out[77]:
        month          B
0  2018-02-28   0.000000
1  2018-03-31   3.797394
2  2018-04-30   3.754154
3  2018-05-31   7.038814
4  2018-06-30  11.072340
5  2018-07-31  11.562738
6  2018-08-31  10.692814
7  2018-09-30  11.321464
8  2018-10-31   7.688380
9  2018-11-30   4.381660
10 2018-12-31   3.671854
11 2019-01-31   3.798000
12 2019-02-28   4.784000
13 2019-03-31   5.384000
14 2019-04-30   1.346000
15 2019-05-31   0.000000
16 2019-06-30   0.000000
17 2019-07-31   0.000000
18 2019-08-31   0.000000
19 2019-09-30   0.000000
20 2019-10-31   0.000000
21 2019-11-30   0.000000
22 2019-12-31   0.000000

now we can merge it with the original DF (we don't need B column in the original DF):

In [79]: (df.drop('B',1)
    ...:    .merge(df.drop('A',1)
    ...:             .assign(month=df.month + pd.offsets.MonthEnd(12)),
    ...:           on='month', how='left'))
Out[79]:
        month         A          B
0  2017-02-28  0.737757        NaN
1  2017-03-31  0.759479        NaN
2  2017-04-30  0.750831        NaN
3  2017-05-31  0.703881        NaN
4  2017-06-30  0.692021        NaN
5  2017-07-31  0.680161        NaN
6  2017-08-31  0.668301        NaN
7  2017-09-30  0.707592        NaN
8  2017-10-31  0.698944        NaN
9  2017-11-30  0.625951        NaN
10 2017-12-31  0.611976        NaN
11 2018-01-31  0.633000        NaN
12 2018-02-28  0.598000   0.000000
13 2018-03-31  0.673000   3.797394
14 2018-04-30  0.673000   3.754154
15 2018-05-31  0.609000   7.038814
16 2018-06-30  0.609000  11.072340
17 2018-07-31  0.609000  11.562738
18 2018-08-31  0.609000  10.692814
19 2018-09-30  0.673000  11.321464
20 2018-10-31  0.673000   7.688380
21 2018-11-30  0.598000   4.381660
22 2018-12-31  0.598000   3.671854

then using .eval("C = A * B", inplace=False) we cann generate a new column "on the fly"

How to compute a column depending on previous values of one and current values of another column

Answers (1)

Related Questions