rpl
rpl

Reputation: 451

python pandas rolling function with two arguments

My beginner's love for python is undergoing a hard trial...

I need to calculate a function in a rolling window of a fixed length (let's say: 5). The function requires two parameters. I am well aware of the answer here which is nearly identical, but I keep getting errors.

My code is simple:

import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats

df = pd.DataFrame( {'A' : np.arange(20), 'B' : np.random.randint(0,20,20)})

def my_tau2(idx):
    x = df.loc[idx, 'A'].astype('float')
    y = df.loc[idx, 'B'].astype('float')
    return scipy.stats.mstats.kendalltau(x, y)[0] ## breaks without this [0]

pd.rolling_apply(np.arange(len(df), dtype = np.dtype('int16')), 5, my_tau2)

And I keep getting the following error:

enter code
File "<ipython-input-6-d6cbc608d2f0>", line 7, in <module>
pd.rolling_apply(np.arange(len(df), dtype = np.dtype('int16')), 5, my_tau2)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply
kwargs=kwargs)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat
result = getattr(r, name)(*args, **kwds)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 863, in apply
return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 621, in apply
center=False)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 560, in _apply
result = calc(values)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 555, in calc
return func(x, window, min_periods=self.min_periods)
File "D:\Users\502031217\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\window.py", line 618, in f
kwargs)
File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51581)
TypeError: a float is required

I've been struggling with that and I'm going bonkers. My module versions are:

Any hints w.r.t. how to fix or calculate this in another way are wholeheartedly welcome.

Upvotes: 2

Views: 1262

Answers (1)

Marjan Moderc
Marjan Moderc

Reputation: 2859

I am not familiar with kendall tau coeficient, but according to the above linked post, maybe you should rewrite your tau function to return one value only. So, judging by the link you provided, I would design your tau like following (still not too flexible, in my opinion, since it uses hardcoded column names from outer scope):

def my_tau2(idx):
    df_tau = df[["A","B"]].iloc[idx]
    return scipy.stats.mstats.kendalltau(df_tau["A"], df_tau["B"])[0]

That would allow me to perform rolling_apply (and of course saving it into the dataframe - which you didn't seem to have done):

df["tau"] = pd.rolling_apply(np.arange(len(df)), 5, my_tau2)

Running this outputed the following result:

     A   B       tau
0    0   0       NaN
1    1  11       NaN
2    2   2       NaN
3    3  11       NaN
4    4  17  0.737865
5    5   9  0.105409
6    6   5  0.000000
7    7   9 -0.527046
8    8  15 -0.105409
9    9  11  0.527046
10  10   4  0.000000
11  11   6 -0.400000
12  12  14 -0.200000
13  13  19  0.600000
14  14   0  0.200000
15  15  19  0.316228
16  16   9 -0.105409
17  17   1 -0.316228
18  18  13  0.200000
19  19  16  0.000000

Upvotes: 3

Related Questions