Igor Rivin
Igor Rivin

Reputation: 4864

iterating over index of pandas dataframe

Suppose we have a table of (for simplicity)historical stock data, which has columns like

timestamp   ticker open high low close

and I want to have a column which has the percentage return for ticker on the date.

Now, the obvious first step is make a multi-index of timestamp and ticker, and then for each ticker one can do something like:

df.loc[idx[:, ['AAPL']], idx['ret']]=df.loc[idx[:, ['AAPL']], idx['close']].pct_chg()

What is not clear is how to iterate over the tickers in the most elegant way possible. One can make a list of tickers, and just do a plain python for loop, iterating the above over all of them, but this seems a little clunky. is there a more "pandas" way of doing this?

To answer the comment, for simplicity, let's just have the close, and suppose the data looks like

date ticker close
Jan 1 A     10
Jan 2 A     20
Jan 3 A     30
Jan 1 B     15
Jan 2 B     15
Jan 3 B     7.5

The result should be

date  ticker close return
Jan 1 A       10    NaN
Jan 2 A       20    1.0
Jan 3 A       30    0.5
Jan 1 B       15    NaN
Jan 2 B       15    0
Jan 3 B       7.5  -0.5

Upvotes: 2

Views: 1389

Answers (1)

cs95
cs95

Reputation: 402263

It seems like a simple groupby + pct_change call should be enough.

df['return'] = df.groupby(['ticker']).close.pct_change()
df

    date ticker  close  return
0  Jan 1      A   10.0     NaN
1  Jan 2      A   20.0     1.0
2  Jan 3      A   30.0     0.5
3  Jan 1      B   15.0     NaN
4  Jan 2      B   15.0     0.0
5  Jan 3      B    7.5    -0.5

Alexander raised a good point in the comments. Your data should be sorted by date and ticker for this operation to make sense. If it isn't, make a sort_values call first -

df = df.sort_values(['date', 'ticker'])

Upvotes: 2

Related Questions