Subtracting columns of dataframes on the basis of index of column names present in lists

Question

I have two lists (l1 and l2) consisting of required columns of two dataframes (df1 and df2) that I would like to apply operation upon.

One list has all required columns ending with an _x and the other with an _y.

I would like to subtract the values of these columns by index, for instance,

df_final['first_col_sub'] = first element of l2 - first element of l1
df_final['second_col_sub'] = second element of l2 - second element of l1 and so on

Actually both of the dataframes have the same column headers and I cannot directly use the column headers to perform the operation which is why I added _x and _y in the column headers so that it may help to do the subtraction.

For example,

df1,

    1_x | 2_x | 3_x | 4_x | 5_x ...
     1  |  2  |  3  |  4  |  5


df2,
    1_y | 2_y | 3_y | 4_y | 5_y ...
     5  |  4  |  3  |  2  |  1



df_final,
    first_col_sub | first_col_sub | first_col_sub | first_col_sub | first_col_sub ...
           4      |       2       |        0      |      -2       |       -4

How may I achieve this? Any help is greatly appreciated.

(If there is anything unclear please let me know.)

jezrael · Accepted Answer

If same number of rows and same number of columns with matched values before _ use:

l1 = ['1_x','2_x'...]
l2 = ['1_y','2_y'...]
df2[l2].sub(df1[l1].to_numpy())

Or better is remove _x and _y for match values before _ in subtract:

f = lambda x: x[:-2]
df2[l2].rename(columns=f).sub(df1[l1].rename(columns=f))

Also is possible filter by ending y and x columns in DataFrame.filter:

f = lambda x: x[:-2]
df2.filter(regex='y$').rename(columns=f).sub(df1.filter(regex='x$').rename(columns=f))

Or:

df22 = df2.filter(regex='y$')
df11 = df1.filter(regex='x$')
df22.columns = df22.columns.str[:-2]
df11.columns = df11.columns.str[:-2]

df22.sub(df11)

Subtracting columns of dataframes on the basis of index of column names present in lists

Answers (2)

Related Questions