visualnotsobasic
visualnotsobasic

Reputation: 428

pandas .diff() but use first cell as difference between last cell in prior column

say that i have a df in the following format:

year        2016  2017  2018  2019  2020  min  max    avg
month                                                    
2021-01-01   284   288   311   383   476  284  476  357.4
2021-02-01   301   315   330   388   441  301  441  359.6
2021-03-01   303   331   341   400   475  303  475  375.4
2021-04-01   283   300   339   419   492  283  492  372.6
2021-05-01   287   288   346   420   445  287  445  359.7
2021-06-01   283   292   340   424   446  283  446  359.1
2021-07-01   294   296   360   444   452  294  452  370.3
2021-08-01   294   315   381   445   451  294  451  375.9
2021-09-01   288   331   405   464   459  288  464  385.6
2021-10-01   327   349   424   457   453  327  457  399.1
2021-11-01   316   351   413   469   471  316  471  401.0
2021-12-01   259   329   384   467   465  259  467  375.7

and i would like to get the difference of the 2020 column by using df['delta'] = df['2020'].diff()

this will obviously return NaN for the first value in the column. how can i make it so that it automatically interprets that diff as the difference between the FIRST value of 2020 and the LAST value of 2019?

Upvotes: 2

Views: 374

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195438

If you want only for 2020:

df["delta"] = pd.concat([df["2019"], df["2020"]]).diff().tail(len(df))

Prints:

          year  2016  2017  2018  2019  2020  min  max    avg  delta
0   2021-01-01   284   288   311   383   476  284  476  357.4    9.0
1   2021-02-01   301   315   330   388   441  301  441  359.6  -35.0
2   2021-03-01   303   331   341   400   475  303  475  375.4   34.0
3   2021-04-01   283   300   339   419   492  283  492  372.6   17.0
4   2021-05-01   287   288   346   420   445  287  445  359.7  -47.0
5   2021-06-01   283   292   340   424   446  283  446  359.1    1.0
6   2021-07-01   294   296   360   444   452  294  452  370.3    6.0
7   2021-08-01   294   315   381   445   451  294  451  375.9   -1.0
8   2021-09-01   288   331   405   464   459  288  464  385.6    8.0
9   2021-10-01   327   349   424   457   453  327  457  399.1   -6.0
10  2021-11-01   316   351   413   469   471  316  471  401.0   18.0
11  2021-12-01   259   329   384   467   465  259  467  375.7   -6.0

Upvotes: 3

BENY
BENY

Reputation: 323276

You can try unstack then do the diff, notice the first item in 2016 will still be NaN

out = df.drop(['min','max','avg'],1).unstack().diff().unstack(0)
            2016  2017  2018  2019  2020
2021-01-01   NaN  29.0 -18.0  -1.0   9.0
2021-02-01  17.0  27.0  19.0   5.0 -35.0
2021-03-01   2.0  16.0  11.0  12.0  34.0
2021-04-01 -20.0 -31.0  -2.0  19.0  17.0
2021-05-01   4.0 -12.0   7.0   1.0 -47.0
2021-06-01  -4.0   4.0  -6.0   4.0   1.0
2021-07-01  11.0   4.0  20.0  20.0   6.0
2021-08-01   0.0  19.0  21.0   1.0  -1.0
2021-09-01  -6.0  16.0  24.0  19.0   8.0
2021-10-01  39.0  18.0  19.0  -7.0  -6.0
2021-11-01 -11.0   2.0 -11.0  12.0  18.0
2021-12-01 -57.0 -22.0 -29.0  -2.0  -6.0

Upvotes: 3

Related Questions