Perform calculation on multiple columns at once with pandas

Question

I have a big dataframe with more than 1 million rows. The current df has only columns X,a,b,c. I want to perform a calculation that yields new columns: new_a,new_b,new_c (see picture)

The calculation is: new_a = a/(X^2)

I already have a way to do it in python:

col_list = ['a','b','c']

def new(col,X):
    score = col/(X**2)
    return score

new_col = ['new_a','new_b','new_c']

def calculate(df):
    for i in range(len(new_col)):
        df[new_col[i]] = df.apply(lambda row: new(row[col_list[i]],row['X']),axis=1)

calculate(df)

I wonder if there is another way to achieve the same goal? This current way of doing it is fine but takes a lot of time to run and somehow yields weird results for certain operations. Thank you.

cs95 · Accepted Answer

col_list = ['a','b','c']
df = pd.concat(
    [df, df[col_list].div(df['X'] ** 2, axis=0).add_prefix('new_')], axis=1
)

df
   X  a  b  c     new_a     new_b     new_c
0  5  3  4  5  0.120000  0.160000  0.200000
1  7  2  4  2  0.040816  0.081633  0.040816

Pandas performs index-aligned division on each column, just concatenate the result afterwards.

Perform calculation on multiple columns at once with pandas

Answers (2)

Related Questions