Reputation: 23
I am having trouble figuring out how to iterate over variables in a pandas dataframe and perform same arithmetic function on each.
I have a dataframe df
that contain three numeric variables x1
, x2
and x3
. I want to create three new variables by multiplying each by 2. Here’s what I am doing:
existing = ['x1','x2','x3']
new = ['y1','y2','y3']
for i in existing:
for j in new:
df[j] = df[i]*2
Above code is in fact creating three new variables y1
, y2
and y3
in the dataframe. But the values of y1
and y2
are being overridden by the values of y3
and all three variables have same values, corresponding to that of y3
. I am not sure what I am missing.
Really appreciate any guidance/ suggestion. Thanks.
Upvotes: 0
Views: 996
Reputation: 3639
You can concatenante the original DataFrame with the columns with doubled values:
cols_to_double = ['x0', 'x1', 'x2']
new_cols = list(df.columns) + [c.replace('x', 'y') for c in cols_to_double]
df = pd.concat([df, 2 * df[cols_to_double]], axis=1, copy=True)
df.columns = new_cols
So, if your input df
Dataframe is:
x0 x1 x2 other0 other1
0 0 1 2 3 4
1 0 1 2 3 4
2 0 1 2 3 4
3 0 1 2 3 4
4 0 1 2 3 4
after executing the previous lines, you get:
x0 x1 x2 other0 other1 y0 y1 y2
0 0 1 2 3 4 0 2 4
1 0 1 2 3 4 0 2 4
2 0 1 2 3 4 0 2 4
3 0 1 2 3 4 0 2 4
4 0 1 2 3 4 0 2 4
Here the code to create df
:
import pandas as pd
import numpy as np
df = pd.DataFrame(
data=np.column_stack([np.full((5,), i) for i in range(5)]),
columns=[f'x{i}' for i in range(3)] + [f'other{i}' for i in range(2)]
)
Upvotes: 0
Reputation: 1
I would do something more generic
#existing = ['x1','x2','x3']
exisiting = df.columns
new = existing.replace('x','y')
#maybe you need map+lambda/for for each existing string
for (ind_existing, ind_new) in zip(existing,new):
df[new[ind_new]] = df[existing[ind_existing]]*2
#maybe there is more elegant way by using pandas assign function
Upvotes: 0
Reputation: 16147
You are looping something like 9 times here - 3 times for each column, with each iteration overwriting the previous.
You may want something like
for e, n in zip(existing,new):
df[n] = df[e]*2
Upvotes: 2