Pythonic way to add multiple calculated columns to a data frame?

Question

Is there a more succinct / pythonic / pandas-native way of writing the following?

all_pos = ['NN', 'VB', 'ADJ']
for col in all_pos:
    df_out['delta_'+col] = df_out[col] - mean_df[col]

df_out and mean_df contain the same column names and indices, and I want to create new columns in df_out containing the difference between them.

So df_out could contain

Index  NN VB ADJ

239    9  4  3
250    2  2  1

And df_mean could contain

Index  NN VB ADJ

239    3  1  8
250    7  4  3

I would want df_out to look like

    Index  NN VB ADJ delta_NN delta_VB delta_ADJ

    239    9  4  3       6        3       -5
    250    2  2  1      -5       -2       -2

mozway · Accepted Answer

Use a simple subtraction (no need to do it per column) and concat the input and output:

pd.concat([df_out,
           (df_out - df_mean).add_prefix('delta_')
          ], axis=1)

or

df1.join((df1-df2).add_prefix('delta_'))

(df_out - df_mean) can also be written df_out.sub(df_mean)

output:

       NN  VB  ADJ  delta_NN  delta_VB  delta_ADJ
Index                                            
239     9   4    3         6         3         -5
250     2   2    1        -5        -2         -2

NB. I assumed "Index" is the index, if not first run:

df_out.set_index('Index', inplace=True)
df_mean.set_index('Index', inplace=True)

Pythonic way to add multiple calculated columns to a data frame?

Answers (1)

Related Questions