navi
navi

Reputation: 23

Iterate over columns of Pandas dataframe and create new variables

I am having trouble figuring out how to iterate over variables in a pandas dataframe and perform same arithmetic function on each.

I have a dataframe df that contain three numeric variables x1, x2 and x3. I want to create three new variables by multiplying each by 2. Here’s what I am doing:

existing = ['x1','x2','x3']
new = ['y1','y2','y3']

for i in existing:
    for j in new:
        df[j] = df[i]*2

Above code is in fact creating three new variables y1, y2 and y3 in the dataframe. But the values of y1 and y2 are being overridden by the values of y3 and all three variables have same values, corresponding to that of y3. I am not sure what I am missing.

Really appreciate any guidance/ suggestion. Thanks.

Upvotes: 0

Views: 996

Answers (3)

PieCot
PieCot

Reputation: 3639

You can concatenante the original DataFrame with the columns with doubled values:

cols_to_double = ['x0', 'x1', 'x2']
new_cols = list(df.columns) + [c.replace('x', 'y') for c in cols_to_double]

df = pd.concat([df, 2 * df[cols_to_double]], axis=1, copy=True)
df.columns = new_cols

So, if your input df Dataframe is:

   x0  x1  x2  other0  other1
0   0   1   2       3       4
1   0   1   2       3       4
2   0   1   2       3       4
3   0   1   2       3       4
4   0   1   2       3       4

after executing the previous lines, you get:

   x0  x1  x2  other0  other1  y0  y1  y2
0   0   1   2       3       4   0   2   4
1   0   1   2       3       4   0   2   4
2   0   1   2       3       4   0   2   4
3   0   1   2       3       4   0   2   4
4   0   1   2       3       4   0   2   4

Here the code to create df:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    data=np.column_stack([np.full((5,), i) for i in range(5)]),
    columns=[f'x{i}' for i in range(3)] + [f'other{i}' for i in range(2)]
)

Upvotes: 0

Almogx3
Almogx3

Reputation: 1

I would do something more generic

#existing = ['x1','x2','x3']
exisiting = df.columns
new = existing.replace('x','y') 
#maybe you need map+lambda/for for each existing string

for (ind_existing, ind_new) in zip(existing,new):
    df[new[ind_new]] = df[existing[ind_existing]]*2 
#maybe there is more elegant way by using pandas assign function

Upvotes: 0

Chris
Chris

Reputation: 16147

You are looping something like 9 times here - 3 times for each column, with each iteration overwriting the previous.

You may want something like

for e, n in zip(existing,new):
    df[n] = df[e]*2

Upvotes: 2

Related Questions