Apply a function from a groupby transform

Question

My pandas looks like this

Date    Ticker  Open    High    Low Adj Close   Adj_Close   Volume
2016-04-18  vws.co  445.0   449.2   441.7   447.3   447.3   945300
2016-04-19  vws.co  449.0   455.8   448.3   450.9   450.9   907700
2016-04-20  vws.co  451.0   452.5   435.4   436.6   436.6   1268100
2016-04-21  vws.co  440.1   442.9   428.4   435.5   435.5   1308300
2016-04-22  vws.co  435.5   435.5   435.5   435.5   435.5   0
2016-04-25  vws.co  431.0   436.7   424.4   430.0   430.0   1311700
2016-04-18  nflx    109.9   110.7   106.02  108.4   108.4   27001500
2016-04-19  nflx    99.49   101.37  94.2    94.34   94.34   55623900
2016-04-20  nflx    94.34   96.98   93.14   96.77   96.77   25633600
2016-04-21  nflx    97.31   97.38   94.78   94.98   94.98   19859400
2016-04-22  nflx    94.85   96.69   94.21   95.9    95.9    15786000
2016-04-25  nflx    95.7    95.75   92.8    93.56   93.56   14965500

I have a program that at one of the functions with embedded functions sucessfully runs a groupby.

This line looks like this

df['MA3'] = df.groupby('Ticker').Adj_Close.transform(lambda group: pd.rolling_mean(group, window=3))

Se my initial question and the data-format here:

Select only one value in df col rows in same df for calc results from different val, and calc df only on one ticker at a time

It has now dawned on me that rather than doing the groupby in each embedded function of which I have 5, I would rather have the groupby run in the main program calling the top function, so all the embedded functions could work on the filtered groupby pandas dataframe from only doing the groupby once...

How do I apply my main function with groupby, in order to filter my pandas, so I only work on one ticker (value in col 'Ticker') at a time?

The 'Ticker' col contains 'aapl', 'msft', 'nflx' company identifyers etc, with timeseries data for a time-window.

Thanks a lot Karasinski. This is close to what I want. But I get an errror.

When I run:

def Screener(df_all, group):

    # Copy df_all to df for single ticker operations
    df = df_all.copy()
    def diff_calc(df,ticker):

        df['Difference'] = df['Adj_Close'].diff()
        return df
    df = diff_calc(df, ticker)
    return df_all

for ticker in stocklist:

    df_all[['Difference']] = df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker)

I get this error:

Traceback (most recent call last):

  File "", line 1, in 
    runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 144, in 
    df_all[['Difference']] = df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 667, in _python_apply_general
    self.axis)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1286, in apply
    res = f(group)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 659, in f
    return func(g, *args, **kwargs)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 112, in Screener
    df = diff_calc(df, ticker)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 70, in diff_calc
    df['Difference'] = df['Adj_Close'].diff()

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\series.py", line 514, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas	series\index.py", line 1221, in get_value
    raise KeyError(key)

KeyError: 'Adj_Close'

And when I use functools like so

df_all = functools.partial(df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker))

I get the same error as above...

Traceback (most recent call last):

  File "", line 1, in 
    runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 148, in 
    df_all = functools.partial(df_all.groupby('Ticker').Adj_Close.apply(Screener, [ticker]))

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 667, in _python_apply_general
    self.axis)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1286, in apply
    res = f(group)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 659, in f
    return func(g, *args, **kwargs)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 114, in Screener
    df = diff_calc(df, ticker)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 72, in diff_calc
    df['Difference'] = df['Adj_Close'].diff()

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\series.py", line 514, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-

3.3.5.amd64\lib\site-packages\pandas	series\index.py", line 1221, in get_value
        raise KeyError(key)

    KeyError: 'Adj_Close'

Edit from Karasinski's edit from 31/5.

When I run the last suggestion from Karasinski I get this error.

mmm
mmm
nflx
vws.co
Traceback (most recent call last):

  File "", line 1, in 
    runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 173, in 
    df_all[['mean', 'max', 'median', 'min']] = df_all.groupby('Ticker').apply(group_func)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
    return self._python_apply_general(f)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 670, in _python_apply_general
    not_indexed_same=mutated)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 2785, in _wrap_applied_output
    not_indexed_same=not_indexed_same)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex_axis(ax, axis=self.axis)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\frame.py", line 2508, in reindex_axis
    fill_value=fill_value)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\generic.py", line 1841, in reindex_axis
    {axis: [new_index, indexer]}, fill_value=fill_value, copy=copy)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\generic.py", line 1865, in _reindex_with_indexers
    copy=copy)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\internals.py", line 3144, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis

John Karasinski · Accepted Answer

From an answer from your previous question we can set up with

import pandas as pd
from StringIO import StringIO

text = """Date   Ticker        Open        High         Low   Adj_Close   Volume
2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)

You can then make a function which calculates whatever statistics you'd like, such as:

def various_indicators(group):
    mean = pd.rolling_mean(group, window=3)
    max = pd.rolling_max(group, window=3)
    median = pd.rolling_median(group, window=3)
    min = pd.rolling_min(group, window=3)

    return pd.DataFrame({'mean': mean,
                         'max': max, 
                         'median': median, 
                         'min': min})

To assign these new columns to your dataframe, you would then do a groupby and then apply the function by

df[['mean', 'max', 'median', 'min']] = df.groupby('Ticker').Adj_Close.apply(various_indicators)

EDIT

In regards to your further questions in the comments of the answer: To extract additional information from the dataframe, you should instead pass the entire group rather than just the single column.

def group_func(group):
    ticker = group.Ticker.unique()[0]
    adj_close = group.Adj_Close

    return Screener(ticker, adj_close)

def Screener(ticker, adj_close):
    print(ticker)    

    mean = pd.rolling_mean(adj_close, window=3)
    max = pd.rolling_max(adj_close, window=3)
    median = pd.rolling_median(adj_close, window=3)
    min = pd.rolling_min(adj_close, window=3)

    return pd.DataFrame({'mean': mean,
                         'max': max, 
                         'median': median, 
                         'min': min})

You can then assign these columns in a similar way as above

df[['mean', 'max', 'median', 'min']] = df.groupby('Ticker').apply(group_func)

Apply a function from a groupby transform

Answers (1)

Related Questions