Reputation: 143
My pandas looks like this
Date Ticker Open High Low Adj Close Adj_Close Volume
2016-04-18 vws.co 445.0 449.2 441.7 447.3 447.3 945300
2016-04-19 vws.co 449.0 455.8 448.3 450.9 450.9 907700
2016-04-20 vws.co 451.0 452.5 435.4 436.6 436.6 1268100
2016-04-21 vws.co 440.1 442.9 428.4 435.5 435.5 1308300
2016-04-22 vws.co 435.5 435.5 435.5 435.5 435.5 0
2016-04-25 vws.co 431.0 436.7 424.4 430.0 430.0 1311700
2016-04-18 nflx 109.9 110.7 106.02 108.4 108.4 27001500
2016-04-19 nflx 99.49 101.37 94.2 94.34 94.34 55623900
2016-04-20 nflx 94.34 96.98 93.14 96.77 96.77 25633600
2016-04-21 nflx 97.31 97.38 94.78 94.98 94.98 19859400
2016-04-22 nflx 94.85 96.69 94.21 95.9 95.9 15786000
2016-04-25 nflx 95.7 95.75 92.8 93.56 93.56 14965500
I have a program that at one of the functions with embedded functions sucessfully runs a groupby.
This line looks like this
df['MA3'] = df.groupby('Ticker').Adj_Close.transform(lambda group: pd.rolling_mean(group, window=3))
Se my initial question and the data-format here:
It has now dawned on me that rather than doing the groupby in each embedded function of which I have 5, I would rather have the groupby run in the main program calling the top function, so all the embedded functions could work on the filtered groupby pandas dataframe from only doing the groupby once...
How do I apply my main function with groupby, in order to filter my pandas, so I only work on one ticker (value in col 'Ticker') at a time?
The 'Ticker' col contains 'aapl', 'msft', 'nflx' company identifyers etc, with timeseries data for a time-window.
Thanks a lot Karasinski. This is close to what I want. But I get an errror.
When I run:
def Screener(df_all, group):
# Copy df_all to df for single ticker operations
df = df_all.copy()
def diff_calc(df,ticker):
df['Difference'] = df['Adj_Close'].diff()
return df
df = diff_calc(df, ticker)
return df_all
for ticker in stocklist:
df_all[['Difference']] = df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker)
I get this error:
Traceback (most recent call last):
File "<ipython-input-2-d7c1835f6b2a>", line 1, in <module>
runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 144, in <module>
df_all[['Difference']] = df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
return self._python_apply_general(f)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 667, in _python_apply_general
self.axis)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1286, in apply
res = f(group)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 659, in f
return func(g, *args, **kwargs)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 112, in Screener
df = diff_calc(df, ticker)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 70, in diff_calc
df['Difference'] = df['Adj_Close'].diff()
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\series.py", line 514, in __getitem__
result = self.index.get_value(self, key)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\tseries\index.py", line 1221, in get_value
raise KeyError(key)
KeyError: 'Adj_Close'
And when I use functools like so
df_all = functools.partial(df_all.groupby('Ticker').Adj_Close.apply(Screener, ticker))
I get the same error as above...
Traceback (most recent call last):
File "<ipython-input-5-d7c1835f6b2a>", line 1, in <module>
runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 148, in <module>
df_all = functools.partial(df_all.groupby('Ticker').Adj_Close.apply(Screener, [ticker]))
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
return self._python_apply_general(f)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 667, in _python_apply_general
self.axis)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1286, in apply
res = f(group)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 659, in f
return func(g, *args, **kwargs)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 114, in Screener
df = diff_calc(df, ticker)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 72, in diff_calc
df['Difference'] = df['Adj_Close'].diff()
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\series.py", line 514, in __getitem__
result = self.index.get_value(self, key)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-
3.3.5.amd64\lib\site-packages\pandas\tseries\index.py", line 1221, in get_value
raise KeyError(key)
KeyError: 'Adj_Close'
Edit from Karasinski's edit from 31/5.
When I run the last suggestion from Karasinski I get this error.
mmm
mmm
nflx
vws.co
Traceback (most recent call last):
File "<ipython-input-4-d7c1835f6b2a>", line 1, in <module>
runfile('C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py', wdir='C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox')
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/Morten/Documents/Design/Python/CrystalBall - Local - Git/Git - CrystalBall/sandbox/screener_test simple for StockOverflowNestedFct_Getstock.py", line 173, in <module>
df_all[['mean', 'max', 'median', 'min']] = df_all.groupby('Ticker').apply(group_func)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 663, in apply
return self._python_apply_general(f)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 670, in _python_apply_general
not_indexed_same=mutated)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 2785, in _wrap_applied_output
not_indexed_same=not_indexed_same)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
result = result.reindex_axis(ax, axis=self.axis)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\frame.py", line 2508, in reindex_axis
fill_value=fill_value)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\generic.py", line 1841, in reindex_axis
{axis: [new_index, indexer]}, fill_value=fill_value, copy=copy)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\generic.py", line 1865, in _reindex_with_indexers
copy=copy)
File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\pandas\core\internals.py", line 3144, in reindex_indexer
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Upvotes: 0
Views: 327
Reputation: 1006
From an answer from your previous question we can set up with
import pandas as pd
from StringIO import StringIO
text = """Date Ticker Open High Low Adj_Close Volume
2015-04-09 vws.co 315.000000 316.100000 312.500000 311.520000 1686800
2015-04-10 vws.co 317.000000 319.700000 316.400000 312.700000 1396500
2015-04-13 vws.co 317.900000 321.500000 315.200000 315.850000 1564500
2015-04-14 vws.co 320.000000 322.400000 318.700000 314.870000 1370600
2015-04-15 vws.co 320.000000 321.500000 319.200000 316.150000 945000
2015-04-16 vws.co 319.000000 320.200000 310.400000 307.870000 2236100
2015-04-17 vws.co 309.900000 310.000000 302.500000 299.100000 2711900
2015-04-20 vws.co 303.000000 312.000000 303.000000 306.490000 1629700
2016-03-31 mmm 166.750000 167.500000 166.500000 166.630005 1762800
2016-04-01 mmm 165.630005 167.740005 164.789993 167.529999 1993700
2016-04-04 mmm 167.110001 167.490005 165.919998 166.399994 2022800
2016-04-05 mmm 165.179993 166.550003 164.649994 165.809998 1610300
2016-04-06 mmm 165.339996 167.080002 164.839996 166.809998 2092200
2016-04-07 mmm 165.880005 167.229996 165.250000 167.160004 2721900"""
df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)
You can then make a function which calculates whatever statistics you'd like, such as:
def various_indicators(group):
mean = pd.rolling_mean(group, window=3)
max = pd.rolling_max(group, window=3)
median = pd.rolling_median(group, window=3)
min = pd.rolling_min(group, window=3)
return pd.DataFrame({'mean': mean,
'max': max,
'median': median,
'min': min})
To assign these new columns to your dataframe, you would then do a groupby
and then apply
the function by
df[['mean', 'max', 'median', 'min']] = df.groupby('Ticker').Adj_Close.apply(various_indicators)
EDIT
In regards to your further questions in the comments of the answer: To extract additional information from the dataframe, you should instead pass the entire group rather than just the single column.
def group_func(group):
ticker = group.Ticker.unique()[0]
adj_close = group.Adj_Close
return Screener(ticker, adj_close)
def Screener(ticker, adj_close):
print(ticker)
mean = pd.rolling_mean(adj_close, window=3)
max = pd.rolling_max(adj_close, window=3)
median = pd.rolling_median(adj_close, window=3)
min = pd.rolling_min(adj_close, window=3)
return pd.DataFrame({'mean': mean,
'max': max,
'median': median,
'min': min})
You can then assign these columns in a similar way as above
df[['mean', 'max', 'median', 'min']] = df.groupby('Ticker').apply(group_func)
Upvotes: 1