Reputation: 125
I have several monthly, datetime-indexed cumulative Pandas series which I would like to de-cumulate so I can just get the values for the specific months themselves.
So, for each year, Jan is Jan, Feb is Jan + Feb, Mar is Jan + Feb + Mar and so on, until the next year that starts at Jan again.
To be awkward some of these series start with Feb instead.
Here's an example series:
2016-02-29 112.3
2016-03-31 243.0
2016-04-30 360.1
2016-05-31 479.5
2016-06-30 643.0
2016-07-31 757.6
2016-08-31 874.5
2016-09-30 1051.8
2016-10-31 1203.4
2016-11-30 1358.3
2016-12-31 1573.5
2017-01-31 75.0
2017-02-28 140.5
2017-03-31 290.4
2017-04-30 416.6
2017-05-31 548.2
2017-06-30 746.6
2017-07-31 863.5
2017-08-31 985.4
2017-09-30 1160.1
2017-10-31 1302.5
2017-11-30 1465.7
2017-12-31 1694.1
2018-01-31 74.0
2018-02-28 146.3
2018-03-31 300.9
2018-04-30 421.9
2018-05-31 564.1
2018-06-30 771.4
I thought one way to do this would be to use df.diff() to get most of the differences for everything but Jan, replace the incorrect Jan values with NaN then do a df.update(original df) to fill in the NaNs with the correct values.
I'm having trouble trying to replace the Jan data with NaNs. Would anyone be able to help with this or suggest another solution at all please?
Upvotes: 2
Views: 335
Reputation: 402814
I would solve this with groupby
+ diff
+ fillna
:
df.asfreq('M').groupby(pd.Grouper(freq='Y')).diff().fillna(df)
Value
2016-02-29 112.3
2016-03-31 130.7
2016-04-30 117.1
2016-05-31 119.4
2016-06-30 163.5
2016-07-31 114.6
2016-08-31 116.9
2016-09-30 177.3
2016-10-31 151.6
2016-11-30 154.9
2016-12-31 215.2
2017-01-31 75.0
2017-02-28 65.5
2017-03-31 149.9
2017-04-30 126.2
2017-05-31 131.6
2017-06-30 198.4
2017-07-31 116.9
2017-08-31 121.9
2017-09-30 174.7
2017-10-31 142.4
2017-11-30 163.2
2017-12-31 228.4
2018-01-31 74.0
2018-02-28 72.3
2018-03-31 154.6
2018-04-30 121.0
2018-05-31 142.2
2018-06-30 207.3
Assuming the index is the date column, and the "Value" is a float.
Upvotes: 1