Reputation: 183
My dataframe sort of:
Date AAPL NFLX INTC
0 2008-01-02 27.834286 3.764286 25.350000
1 2008-01-03 27.847143 3.724286 24.670000
2 2008-01-04 25.721428 3.515714 22.670000
3 2008-01-07 25.377142 3.554286 22.879999
4 2008-01-08 24.464285 3.328571 22.260000
I want to run .pct_change(1) to each of the stocks in each column, and append the 3 columns to the end of the df.
I know how to do it when I already know the name of the column, say AAPL. With that prior knowledge, I can do df['AAPL_Ret'] = df.AAPL.pct_change(1)
But eventually, I will have 100s of stocks in the file and the goal is to automate the calculations without having to individually type each stock ticker in the code. So I would have to iterate through all the columns and then run the .pct_change somehow.
Any ideas? Many thanks.
Upvotes: 3
Views: 579
Reputation: 3591
One thing you can do is
for col in df.columns:
if not isstock(col):
continue
df["{}_Ret".format(col)] = df[col].pct_change(1)
I've included the isstock()
part because you have the date as a column rather than the index, and you don't want to take the percent change in the date (I presume). You can either define isstock()
elsewhere or replace it in my code with whatever condition you want.
Upvotes: -1
Reputation: 402423
Call pct_change
on the DataFrame
, and then adding the result back becomes straightforward with concat
.
df = df.set_index('Date')
pd.concat([df, df.pct_change(1).add_suffix('_Ret')], axis=1)
AAPL NFLX INTC AAPL_Ret NFLX_Ret INTC_Ret
Date
2008-01-02 27.834286 3.764286 25.350000 NaN NaN NaN
2008-01-03 27.847143 3.724286 24.670000 0.000462 -0.010626 -0.026824
2008-01-04 25.721428 3.515714 22.670000 -0.076335 -0.056003 -0.081070
2008-01-07 25.377142 3.554286 22.879999 -0.013385 0.010971 0.009263
2008-01-08 24.464285 3.328571 22.260000 -0.035972 -0.063505 -0.027098
The only thing to watch out for (and you've probably figured this out already) is that pct_change
won't work if you have non-numeric columns.
Therefore, another (better) alternative would be
pd.concat([df, df.select_dtypes(exclude=[object]).pct_change(1)], 1)
Date AAPL NFLX INTC AAPL NFLX INTC
0 2008-01-02 27.834286 3.764286 25.350000 NaN NaN NaN
1 2008-01-03 27.847143 3.724286 24.670000 0.000462 -0.010626 -0.026824
2 2008-01-04 25.721428 3.515714 22.670000 -0.076335 -0.056003 -0.081070
3 2008-01-07 25.377142 3.554286 22.879999 -0.013385 0.010971 0.009263
4 2008-01-08 24.464285 3.328571 22.260000 -0.035972 -0.063505 -0.027098
Which handles non-numeric columns gracefully when you have many of them.
OTOH, I would recommend setting the Date to the index in advance (and use option 1), because that's usually what you'd do when working with stock data. Not always though.
Upvotes: 3
Reputation: 294258
Note that the period
argument to pct_change
defaults to 1
and can be left out.
drop
the 'Date'
columndf.join(df.drop('Date', 1).pct_change().add_suffix('_Ret'))
Date AAPL NFLX INTC AAPL_Ret NFLX_Ret INTC_Ret
0 2008-01-02 27.834286 3.764286 25.350000 NaN NaN NaN
1 2008-01-03 27.847143 3.724286 24.670000 0.000462 -0.010626 -0.026824
2 2008-01-04 25.721428 3.515714 22.670000 -0.076335 -0.056003 -0.081070
3 2008-01-07 25.377142 3.554286 22.879999 -0.013385 0.010971 0.009263
4 2008-01-08 24.464285 3.328571 22.260000 -0.035972 -0.063505 -0.027098
set_index
df.join(df.set_index('Date').pct_change().add_suffix('_Ret'), on='Date')
Date AAPL NFLX INTC AAPL_Ret NFLX_Ret INTC_Ret
0 2008-01-02 27.834286 3.764286 25.350000 NaN NaN NaN
1 2008-01-03 27.847143 3.724286 24.670000 0.000462 -0.010626 -0.026824
2 2008-01-04 25.721428 3.515714 22.670000 -0.076335 -0.056003 -0.081070
3 2008-01-07 25.377142 3.554286 22.879999 -0.013385 0.010971 0.009263
4 2008-01-08 24.464285 3.328571 22.260000 -0.035972 -0.063505 -0.027098
Upvotes: 3