Reputation: 183

Iterating over columns in a dataframe to run/append calculation

My dataframe sort of:

        Date       AAPL      NFLX       INTC  
0 2008-01-02  27.834286  3.764286  25.350000    
1 2008-01-03  27.847143  3.724286  24.670000    
2 2008-01-04  25.721428  3.515714  22.670000   
3 2008-01-07  25.377142  3.554286  22.879999    
4 2008-01-08  24.464285  3.328571  22.260000

I want to run .pct_change(1) to each of the stocks in each column, and append the 3 columns to the end of the df.

I know how to do it when I already know the name of the column, say AAPL. With that prior knowledge, I can do df['AAPL_Ret'] = df.AAPL.pct_change(1)

But eventually, I will have 100s of stocks in the file and the goal is to automate the calculations without having to individually type each stock ticker in the code. So I would have to iterate through all the columns and then run the .pct_change somehow.

Any ideas? Many thanks.

Upvotes: 3

Answers (3)

Acccumulation

Reputation: 3591

One thing you can do is

for col in df.columns:
    if not isstock(col):
        continue
    df["{}_Ret".format(col)] = df[col].pct_change(1)

I've included the isstock() part because you have the date as a column rather than the index, and you don't want to take the percent change in the date (I presume). You can either define isstock() elsewhere or replace it in my code with whatever condition you want.

Upvotes: -1

cs95

Reputation: 402423

Call pct_change on the DataFrame, and then adding the result back becomes straightforward with concat.

df = df.set_index('Date')
pd.concat([df, df.pct_change(1).add_suffix('_Ret')], axis=1)

                 AAPL      NFLX       INTC  AAPL_Ret  NFLX_Ret  INTC_Ret
Date                                                                    
2008-01-02  27.834286  3.764286  25.350000       NaN       NaN       NaN
2008-01-03  27.847143  3.724286  24.670000  0.000462 -0.010626 -0.026824
2008-01-04  25.721428  3.515714  22.670000 -0.076335 -0.056003 -0.081070
2008-01-07  25.377142  3.554286  22.879999 -0.013385  0.010971  0.009263
2008-01-08  24.464285  3.328571  22.260000 -0.035972 -0.063505 -0.027098

The only thing to watch out for (and you've probably figured this out already) is that pct_change won't work if you have non-numeric columns.

Therefore, another (better) alternative would be

pd.concat([df, df.select_dtypes(exclude=[object]).pct_change(1)], 1)

         Date       AAPL      NFLX       INTC      AAPL      NFLX      INTC
0  2008-01-02  27.834286  3.764286  25.350000       NaN       NaN       NaN
1  2008-01-03  27.847143  3.724286  24.670000  0.000462 -0.010626 -0.026824
2  2008-01-04  25.721428  3.515714  22.670000 -0.076335 -0.056003 -0.081070
3  2008-01-07  25.377142  3.554286  22.879999 -0.013385  0.010971  0.009263
4  2008-01-08  24.464285  3.328571  22.260000 -0.035972 -0.063505 -0.027098

Which handles non-numeric columns gracefully when you have many of them.

OTOH, I would recommend setting the Date to the index in advance (and use option 1), because that's usually what you'd do when working with stock data. Not always though.

Upvotes: 3

piRSquared

Reputation: 294258

Note that the period argument to pct_change defaults to 1 and can be left out.

`drop` the `'Date'` column

df.join(df.drop('Date', 1).pct_change().add_suffix('_Ret'))

        Date       AAPL      NFLX       INTC  AAPL_Ret  NFLX_Ret  INTC_Ret
0 2008-01-02  27.834286  3.764286  25.350000       NaN       NaN       NaN
1 2008-01-03  27.847143  3.724286  24.670000  0.000462 -0.010626 -0.026824
2 2008-01-04  25.721428  3.515714  22.670000 -0.076335 -0.056003 -0.081070
3 2008-01-07  25.377142  3.554286  22.879999 -0.013385  0.010971  0.009263
4 2008-01-08  24.464285  3.328571  22.260000 -0.035972 -0.063505 -0.027098

`set_index`

df.join(df.set_index('Date').pct_change().add_suffix('_Ret'), on='Date')

        Date       AAPL      NFLX       INTC  AAPL_Ret  NFLX_Ret  INTC_Ret
0 2008-01-02  27.834286  3.764286  25.350000       NaN       NaN       NaN
1 2008-01-03  27.847143  3.724286  24.670000  0.000462 -0.010626 -0.026824
2 2008-01-04  25.721428  3.515714  22.670000 -0.076335 -0.056003 -0.081070
3 2008-01-07  25.377142  3.554286  22.879999 -0.013385  0.010971  0.009263
4 2008-01-08  24.464285  3.328571  22.260000 -0.035972 -0.063505 -0.027098

Upvotes: 3

Iterating over columns in a dataframe to run/append calculation

Answers (3)

drop the 'Date' column

set_index

Related Questions

`drop` the `'Date'` column

`set_index`