Reputation: 4864
Suppose we have a table of (for simplicity)historical stock data, which has columns like
timestamp ticker open high low close
and I want to have a column which has the percentage return for ticker on the date.
Now, the obvious first step is make a multi-index of timestamp and ticker, and then for each ticker one can do something like:
df.loc[idx[:, ['AAPL']], idx['ret']]=df.loc[idx[:, ['AAPL']], idx['close']].pct_chg()
What is not clear is how to iterate over the tickers in the most elegant way possible. One can make a list of tickers, and just do a plain python for loop, iterating the above over all of them, but this seems a little clunky. is there a more "pandas" way of doing this?
To answer the comment, for simplicity, let's just have the close, and suppose the data looks like
date ticker close
Jan 1 A 10
Jan 2 A 20
Jan 3 A 30
Jan 1 B 15
Jan 2 B 15
Jan 3 B 7.5
The result should be
date ticker close return
Jan 1 A 10 NaN
Jan 2 A 20 1.0
Jan 3 A 30 0.5
Jan 1 B 15 NaN
Jan 2 B 15 0
Jan 3 B 7.5 -0.5
Upvotes: 2
Views: 1389
Reputation: 402263
It seems like a simple groupby
+ pct_change
call should be enough.
df['return'] = df.groupby(['ticker']).close.pct_change()
df
date ticker close return
0 Jan 1 A 10.0 NaN
1 Jan 2 A 20.0 1.0
2 Jan 3 A 30.0 0.5
3 Jan 1 B 15.0 NaN
4 Jan 2 B 15.0 0.0
5 Jan 3 B 7.5 -0.5
Alexander raised a good point in the comments. Your data should be sorted by date and ticker for this operation to make sense. If it isn't, make a sort_values
call first -
df = df.sort_values(['date', 'ticker'])
Upvotes: 2