Reputation: 2545
I have a Pandas DataFrame similar to the below. Please not this does not need to be in this format, it can be put back into pivoted fashion and have each variable
be a column if that's easier.
date variable value
0 2014-01-31 item1 0.070898
1 2014-02-28 item1 0.064725
2 2014-03-31 item1 0.101292
3 2014-04-30 item1 0.041898
4 2014-05-31 item1 0.094894
5 2014-06-30 item1 0.110287
6 2014-07-31 item1 0.081844
7 2014-08-31 item1 0.043819
8 2014-09-30 item1 0.009196
9 2014-10-31 item1 0.076482
10 2014-11-30 item1 0.055906
11 2014-12-31 item1 0.042028
12 2015-01-31 item1 0.115469
13 2015-02-28 item1 0.130457
14 2015-03-31 item1 0.054314
15 2015-04-30 item1 0.141286
16 2015-05-31 item1 0.109387
17 2015-06-30 item1 0.066154
18 2015-07-31 item1 0.039390
19 2015-08-31 item1 0.071276
20 2015-09-30 item1 0.088784
21 2015-10-31 item1 0.054234
22 2015-11-30 item1 0.111854
23 2015-12-31 item1 0.005053
24 2016-01-31 item1 0.145953
25 2016-02-29 item1 0.138051
26 2016-03-31 item1 0.030395
27 2016-04-30 item1 0.055843
28 2016-05-31 item1 0.037960
29 2016-06-30 item1 0.147318
.. ... ... ...
120 2015-09-30 ref 0.043185
121 2015-10-31 ref 0.046849
122 2015-11-30 ref 0.008139
123 2015-12-31 ref 0.011222
124 2016-01-31 ref 0.026408
125 2016-02-29 ref 0.040404
126 2016-03-31 ref 0.039585
127 2016-04-30 ref 0.028782
128 2016-05-31 ref 0.002799
129 2016-06-30 ref 0.040413
130 2016-07-31 ref 0.004451
131 2016-08-31 ref 0.001946
132 2016-09-30 ref 0.029548
133 2016-10-31 ref 0.029416
134 2016-11-30 ref 0.024893
135 2016-12-31 ref 0.037656
136 2017-01-31 ref 0.045642
137 2017-02-28 ref 0.046690
138 2017-03-31 ref 0.019193
139 2017-04-30 ref 0.016362
140 2017-05-31 ref 0.025512
141 2017-06-30 ref 0.021625
142 2017-07-31 ref 0.026472
143 2017-08-31 ref 0.016531
144 2017-09-30 ref 0.047447
145 2017-10-31 ref 0.032180
146 2017-11-30 ref 0.048635
147 2017-12-31 ref 0.040450
148 2018-01-31 ref 0.047735
149 2018-02-28 ref 0.013727
For each item in variable
I would like to apply a rolling 12 month calculation that accepts as arguments the rolling 12 months of value
for that item, and the rolling 12 months of values where variable
equals ref
. The function is of the form:
def func(series1,series2):
do something with series
return single_value_from_doiong_something
My issue is how to pass two different series into this function to return my single value. Does anyone know how to do this?
Upvotes: 0
Views: 96
Reputation: 9019
Regarding the function you mention, after you restructure your original dataframe with a join()
, this may be closer to what you are after:
combined = df[df['variable']!='ref'].set_index('date').join(df[df['variable']=='ref'].set_index('date'), lsuffix='', rsuffix='_ref').drop('variable_ref', axis=1)
def func(series, ref_series):
#As an example
return series.mean()/ref_series.mean()
combined.groupby('variable').rolling(12).apply(lambda x: func(x, combined.loc[x.index]['value_ref']), raw=False).drop('value_ref', axis=1)
This example yields the following (NaN
are present due to the gaps in your sample data):
value
variable date
item1 2014-01-31 NaN
2014-02-28 NaN
2014-03-31 NaN
2014-04-30 NaN
2014-05-31 NaN
2014-06-30 NaN
2014-07-31 NaN
2014-08-31 NaN
2014-09-30 NaN
2014-10-31 NaN
2014-11-30 NaN
2014-12-31 NaN
2015-01-31 NaN
2015-02-28 NaN
2015-03-31 NaN
2015-04-30 NaN
2015-05-31 NaN
2015-06-30 NaN
2015-07-31 NaN
2015-08-31 NaN
2015-09-30 1.912186
2015-10-31 1.793184
2015-11-30 2.609254
2015-12-31 3.009455
2016-01-31 3.123833
2016-02-29 2.910599
2016-03-31 2.708132
2016-04-30 2.497878
2016-05-31 2.561760
2016-06-30 2.681712
Upvotes: 1