Reputation: 59
Is there a more succinct / pythonic / pandas-native way of writing the following?
all_pos = ['NN', 'VB', 'ADJ']
for col in all_pos:
df_out['delta_'+col] = df_out[col] - mean_df[col]
df_out
and mean_df
contain the same column names and indices, and I want to create new columns in df_out
containing the difference between them.
So df_out
could contain
Index NN VB ADJ
239 9 4 3
250 2 2 1
And df_mean
could contain
Index NN VB ADJ
239 3 1 8
250 7 4 3
I would want df_out
to look like
Index NN VB ADJ delta_NN delta_VB delta_ADJ
239 9 4 3 6 3 -5
250 2 2 1 -5 -2 -2
Upvotes: 0
Views: 60
Reputation: 260500
Use a simple subtraction (no need to do it per column) and concat the input and output:
pd.concat([df_out,
(df_out - df_mean).add_prefix('delta_')
], axis=1)
or
df1.join((df1-df2).add_prefix('delta_'))
(df_out - df_mean)
can also be written df_out.sub(df_mean)
output:
NN VB ADJ delta_NN delta_VB delta_ADJ
Index
239 9 4 3 6 3 -5
250 2 2 1 -5 -2 -2
NB. I assumed "Index" is the index, if not first run:
df_out.set_index('Index', inplace=True)
df_mean.set_index('Index', inplace=True)
Upvotes: 2