Reputation: 451
This is a somewhat extension to my previous problem python pandas rolling function with two arguments .
How do I perform the same by group? Let's say that the 'C' column below is used for grouping.
I am struggling to:
The expected result would be a DataFrame like the one below:
I have been trying the 'pass an index' workaround as described in the link above, but the complexity of this case is beyond my skills :-( . This is a toy example, not that far from what I am working with, so for simplicity i used randomly generated data.
rand = np.random.RandomState(1)
dff = pd.DataFrame({'A' : np.arange(20),
'B' : rand.randint(100, 120, 20),
'C' : rand.randint(0, 2, 20)})
def my_tau_indx(indx):
x = dff.iloc[indx, 0]
y = dff.iloc[indx, 1]
tau = sp.stats.mstats.kendalltau(x, y)[0]
return tau
dff['tau'] = dff.sort_values(['C', 'A']).groupby('C').rolling(window = 5).apply(my_tau_indx, args = ([dff.index.values]))
Every fix I make creates yet another bug...
The Above issue has been solved by Nickil Maveli and it works with numpy 1.11.0, pandas 0.18.1, scipy 0.17.1, andwith conda 4.1.4. It generates some warnings, but works.
On my another machine with latest & greatest numpy 1.12.0, pandas 0.19.2, scipy 0.18.1, conda version 3.10.0 and BLAS/LAPACK - it does not work and I get the traceback below. This seems versions related since I upgraded the 1st machine it also stopped working... In the name of science... ;-)
As Nickil suggested, this was due to incompatibility between numpy 1.11 and 1.12. Downgrading numpy helped. Since I had had BLAS/LAPACK on a Windows, I installed numpy 1.11.3+mkl from http://www.lfd.uci.edu/~gohlke/pythonlibs/ .
Traceback (most recent call last):
File "<ipython-input-4-bbca2c0e986b>", line 16, in <module>
t = grp.apply(func)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 651, in apply
return self._python_apply_general(f)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 655, in _python_apply_general
self.axis)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 1527, in apply
res = f(group)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 647, in f
return func(g, *args, **kwargs)
File "<ipython-input-4-bbca2c0e986b>", line 15, in <lambda>
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply
kwargs=kwargs)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat
result = getattr(r, name)(*args, **kwds)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 863, in apply
return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 621, in apply
center=False)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 560, in _apply
result = calc(values)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 555, in calc
return func(x, window, min_periods=self.min_periods)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 618, in f
kwargs)
File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51768)
File "<ipython-input-4-bbca2c0e986b>", line 8, in my_tau_indx
x = dff.iloc[indx, 0]
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1294, in __getitem__
return self._getitem_tuple(key)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1560, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1614, in _getitem_axis
return self._get_loc(key, axis=axis)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 96, in _get_loc
return self.obj._ixs(key, axis=axis)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\frame.py", line 1908, in _ixs
label = self.index[i]
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\range.py", line 510, in __getitem__
return super_getitem(key)
File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\base.py", line 1275, in __getitem__
result = getitem(key)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
The final check:
Upvotes: 2
Views: 1777
Reputation: 29711
One way to achieve would be to iterate through every group and use pd.rolling_apply
on every such groups.
import scipy.stats as ss
def my_tau_indx(indx):
x = dff.iloc[indx, 0]
y = dff.iloc[indx, 1]
tau = ss.mstats.kendalltau(x, y)[0]
return tau
grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False)
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index)
t = grp.apply(func)
dff.reindex(t.index).assign(tau=t)
EDIT:
def my_tau_indx(indx):
x = dff.ix[indx, 0]
y = dff.ix[indx, 1]
tau = ss.mstats.kendalltau(x, y)[0]
return tau
grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False)
t = grp.rolling(5).apply(my_tau_indx).get('A')
grp.head(dff.shape[0]).reindex(t.index).assign(tau=t)
Upvotes: 1