Reputation: 1057
I have a pandas dataframe on which I wish to apply my own custom rolling function as follows:
def testms(x, field):
mu = np.sum(x[field])
si = np.sum(x[field])/len(x[field])
x['mu'] = mu
x['si'] = si
return x
df2 = pd.concat([pd.DataFrame({'A':[1,1,1,1,1,2,2,2,2,2]}),
pd.DataFrame({'B':random_dates(pd.to_datetime('2015-01-01'),
pd.to_datetime('2018-01-01'), 10)}),
pd.DataFrame({'C':np.random.rand(10)})],axis=1)
df2
A B C
0 1 2016-08-25 01:09:42.953011200 0.791725
1 1 2017-02-23 13:30:20.296310399 0.528895
2 1 2016-10-23 05:33:14.994806400 0.568045
3 1 2016-08-20 17:41:03.991027200 0.925597
4 1 2016-04-09 17:59:00.805200000 0.071036
5 2 2016-12-09 13:06:00.751737600 0.087129
6 2 2016-04-25 00:47:45.953232000 0.020218
7 2 2017-09-05 06:35:58.432531200 0.832620
8 2 2017-11-23 03:18:47.370528000 0.778157
9 2 2016-02-25 15:14:53.907532800 0.870012
tester = lambda x: testms(x, 'C')
df2.set_index('B').groupby('A')['C'].rolling('90D', min_periods=1).apply(tester).reset_index()
However when I apply the above code, I get the following error:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Upvotes: 2
Views: 634
Reputation: 862406
If use Rolling.apply
it working differently like GroupBy.apply
- it processing each columns separately and not possible return multiple columns, only scalars:
So in your solution are necessary 2 functions, where is not possible specify column, but column for processing is specify after groupby
:
def testms1(x):
mu = np.sum(x)
return mu
def testms2(x):
#same like mean
#si = np.sum(x)/len(x)
si = np.mean(x)
return si
tester1 = lambda x: testms1(x)
tester2 = lambda x: testms2(x)
r = df2.set_index('B').groupby('A')['C'].rolling('90D', min_periods=1)
s1 = r.apply(tester1, raw=False).rename('mu')
s2 = r.apply(tester2, raw=False).rename('si')
df = pd.concat([s1, s2], axis=1).reset_index()
print (df)
A B mu si
0 1 2016-08-25 01:09:42.953011200 0.791725 0.791725
1 1 2017-02-23 13:30:20.296310399 0.528895 0.528895
2 1 2016-10-23 05:33:14.994806400 1.096940 0.548470
3 1 2016-08-20 17:41:03.991027200 2.022537 0.674179
4 1 2016-04-09 17:59:00.805200000 2.093573 0.523393
5 2 2016-12-09 13:06:00.751737600 0.087129 0.087129
6 2 2016-04-25 00:47:45.953232000 0.107347 0.053673
7 2 2017-09-05 06:35:58.432531200 0.832620 0.832620
8 2 2017-11-23 03:18:47.370528000 1.610777 0.805389
9 2 2016-02-25 15:14:53.907532800 2.480789 0.826930
Alternative solution with Resampler.aggregate
:
r = df2.set_index('B').groupby('A')['C'].rolling('90D', min_periods=1)
df1 = r.agg(['sum','mean']).rename(columns={'sum':'mu', 'mean':'si'}).reset_index()
print (df1)
A B mu si
0 1 2016-08-25 01:09:42.953011200 0.791725 0.791725
1 1 2017-02-23 13:30:20.296310399 0.528895 0.528895
2 1 2016-10-23 05:33:14.994806400 1.096940 0.548470
3 1 2016-08-20 17:41:03.991027200 2.022537 0.674179
4 1 2016-04-09 17:59:00.805200000 2.093573 0.523393
5 2 2016-12-09 13:06:00.751737600 0.087129 0.087129
6 2 2016-04-25 00:47:45.953232000 0.107347 0.053673
7 2 2017-09-05 06:35:58.432531200 0.832620 0.832620
8 2 2017-11-23 03:18:47.370528000 1.610777 0.805389
9 2 2016-02-25 15:14:53.907532800 2.480789 0.826930
Upvotes: 1