Reputation: 348
I have a big dataframe with more than 1 million rows. The current df has only columns X,a,b,c. I want to perform a calculation that yields new columns: new_a,new_b,new_c (see picture)
The calculation is: new_a = a/(X^2)
I already have a way to do it in python:
col_list = ['a','b','c']
def new(col,X):
score = col/(X**2)
return score
new_col = ['new_a','new_b','new_c']
def calculate(df):
for i in range(len(new_col)):
df[new_col[i]] = df.apply(lambda row: new(row[col_list[i]],row['X']),axis=1)
calculate(df)
I wonder if there is another way to achieve the same goal? This current way of doing it is fine but takes a lot of time to run and somehow yields weird results for certain operations. Thank you.
Upvotes: 2
Views: 5122
Reputation: 1004
Do you want a/X^2 or a/X? You ask for one but your example shows the other.
for col in col_list:
new_col = 'new_' + col
df[new_col] = df[col] / (df['X']**2)
will give you what you ask for, if what you want is actually a/X adjust accordingly.
Upvotes: 1
Reputation: 402413
col_list = ['a','b','c']
df = pd.concat(
[df, df[col_list].div(df['X'] ** 2, axis=0).add_prefix('new_')], axis=1
)
df
X a b c new_a new_b new_c
0 5 3 4 5 0.120000 0.160000 0.200000
1 7 2 4 2 0.040816 0.081633 0.040816
Pandas performs index-aligned division on each column, just concatenate the result afterwards.
Upvotes: 3