pandas take pairwise difference between a lot of columns

Question

I have a pandas dataframe having a lot of actual (column name ending with _act) and projected columns (column name ending with _proj). Other than actual and projected there's also a date column. Now I want to add an error column (in that order, i.e., beside its projected column) for all of them. Sample dataframe:

date a_act a_proj b_act b_proj .... z_act z_proj
2020  10     5      9     11   ....   3     -1
.
.

What I want:

date a_act a_proj a_error b_act b_proj b_error .... z_act z_proj z_error
2020  10     5       5      9     11     -2    ....   3     -1     4
.
.

What's the best way to achieve this, as I have a lot of actual and projected columns?

Dani Mesejo · Accepted Answer

You could do:

df = df.set_index('date')

# create new columns
columns = df.columns[df.columns.str.endswith('act')].str.replace('act', 'error')

# compute differences
diffs = pd.DataFrame(data=df.values[:, ::2] - df.values[:, 1::2], index=df.index, columns=columns)

# concat
res = pd.concat((df, diffs), axis=1)

# reorder columns
res = res.reindex(sorted(res.columns), axis=1)
print(res)

Output

      a_act  a_error  a_proj  b_act  b_error  b_proj  z_act  z_error  z_proj
date                                                                        
2020     10        5       5      9       -2      11      3        4      -1

pandas take pairwise difference between a lot of columns

Answers (1)

Related Questions