Reputation: 451
My beginner's love for python is undergoing a hard trial...
I need to calculate a function in a rolling window of a fixed length (let's say: 5). The function requires two parameters. I am well aware of the answer here which is nearly identical, but I keep getting errors.
My code is simple:
import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats
df = pd.DataFrame( {'A' : np.arange(20), 'B' : np.random.randint(0,20,20)})
def my_tau2(idx):
x = df.loc[idx, 'A'].astype('float')
y = df.loc[idx, 'B'].astype('float')
return scipy.stats.mstats.kendalltau(x, y)[0] ## breaks without this [0]
pd.rolling_apply(np.arange(len(df), dtype = np.dtype('int16')), 5, my_tau2)
And I keep getting the following error:
enter code
File "<ipython-input-6-d6cbc608d2f0>", line 7, in <module>
pd.rolling_apply(np.arange(len(df), dtype = np.dtype('int16')), 5, my_tau2)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply
kwargs=kwargs)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat
result = getattr(r, name)(*args, **kwds)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 863, in apply
return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 621, in apply
center=False)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 560, in _apply
result = calc(values)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 555, in calc
return func(x, window, min_periods=self.min_periods)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 618, in f
kwargs)
File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51581)
TypeError: a float is required
I've been struggling with that and I'm going bonkers. My module versions are:
Any hints w.r.t. how to fix or calculate this in another way are wholeheartedly welcome.
Upvotes: 2
Views: 1262
Reputation: 2859
I am not familiar with kendall tau coeficient, but according to the above linked post, maybe you should rewrite your tau function to return one value only. So, judging by the link you provided, I would design your tau like following (still not too flexible, in my opinion, since it uses hardcoded column names from outer scope):
def my_tau2(idx):
df_tau = df[["A","B"]].iloc[idx]
return scipy.stats.mstats.kendalltau(df_tau["A"], df_tau["B"])[0]
That would allow me to perform rolling_apply (and of course saving it into the dataframe - which you didn't seem to have done):
df["tau"] = pd.rolling_apply(np.arange(len(df)), 5, my_tau2)
Running this outputed the following result:
A B tau
0 0 0 NaN
1 1 11 NaN
2 2 2 NaN
3 3 11 NaN
4 4 17 0.737865
5 5 9 0.105409
6 6 5 0.000000
7 7 9 -0.527046
8 8 15 -0.105409
9 9 11 0.527046
10 10 4 0.000000
11 11 6 -0.400000
12 12 14 -0.200000
13 13 19 0.600000
14 14 0 0.200000
15 15 19 0.316228
16 16 9 -0.105409
17 17 1 -0.316228
18 18 13 0.200000
19 19 16 0.000000
Upvotes: 3