Bebio
Bebio

Reputation: 409

How to add columns with a for loop in a dataframe?

I have two dataframes df1, df2 described below

df1

         prod  age
0  Winalto_eu   28
1  Winalto_uc   25
2      CEM_eu   30

df2

    age   qx
0    25  2.7
1    26  2.8
2    27  2.8
3    28  2.9
4    29  3.0
5    30  3.2
6    31  3.4
7    32  3.7
8    33  4.1
9    34  4.6
10   35  5.1
11   36  5.6
12   37  6.1
13   38  6.7
14   39  7.5
15   40  8.2

I would like to add new columns with a for loop to df1. The names of the new colums should be qx1, qx2,...qx10

for i in range(0,10):
    df1['qx'+str(i)]

The values of qx1 should be affected by the loop, doing a kind of vlookup on the age : For instance on the first row, for the prod 'Winalto_eu', the value of qx1 should be the value of df2['qx'] at the age of 28+1, qx2 the same at 28+2...

The target dataframe should look like this :

         prod  age  qx1  qx2  qx3  qx4  qx5  qx6  qx7  qx8  qx9  qx10
0  Winalto_eu   28  3.0  3.2  3.4  3.7  4.1  4.6  5.1  5.6  6.1   6.7
1  Winalto_uc   25  2.8  2.8  2.9  3.0  3.2  3.4  3.7  4.1  4.6   5.1
2      CEM_eu   30  3.4  3.7  4.1  4.6  5.1  5.6  6.1  6.7  7.5   8.2

Have you any idea ? Thanks

Upvotes: 1

Views: 2098

Answers (5)

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try:

df=df1.assign(key=0).merge(df2.assign(key=0), on="key", suffixes=["", "_y"]).query("age<age_y").drop(["key"], axis=1)

df["q"]=df.groupby("prod")["age_y"].rank()

#to keep only 10 positions for each
df=df.loc[df["q"]<=10]

df=df.pivot_table(index=["prod", "age"], columns="q", values="qx")

df.columns=[f"qx{col:0.0f}" for col in df.columns]

df=df.reset_index()

Output:

         prod  age  qx1  qx2  qx3  ...  qx6  qx7  qx8  qx9  qx10
0      CEM_eu   30  3.4  3.7  4.1  ...  5.6  6.1  6.7  7.5   8.2
1  Winalto_eu   28  3.0  3.2  3.4  ...  4.6  5.1  5.6  6.1   6.7
2  Winalto_uc   25  2.8  2.8  2.9  ...  3.4  3.7  4.1  4.6   5.1

Upvotes: 0

G. Anderson
G. Anderson

Reputation: 5955

Ridiculously overengineered solution:

pd.concat([df1,pd.DataFrame(columns=['qx'+str(i) for i in range(11)], 
                            data=[ser1.T.loc[:,i:i+10].values.flatten().tolist()
                                  for i in df1['age']])],
          axis=1)

    prod    age qx0 qx1 qx2 qx3 qx4 qx5 qx6 qx7 qx8 qx9 qx10
0   Winalto_eu  28  2.9 3.0 3.2 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7
1   Winalto_uc  25  2.7 2.8 2.8 2.9 3.0 3.2 3.4 3.7 4.1 4.6 5.1
2   CEM_eu  30  3.2 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7 7.5 8.2

Upvotes: 0

YOLO
YOLO

Reputation: 21709

Here's a way using .loc filtering the data:

top_n = 10

values = [df2.loc[df2['age'].gt(x),'qx'].iloc[:top_n].tolist() for x in df1['age']]

coln = ['qx'+str(x) for x in range(1,11)]

df1[coln] = pd.DataFrame(values)

         prod  age  qx1  qx2  qx3  qx4  qx5  qx6  qx7  qx8  qx9  qx10
0  Winalto_eu   28  3.0  3.2  3.4  3.7  4.1  4.6  5.1  5.6  6.1   6.7
1  Winalto_uc   25  2.8  2.8  2.9  3.0  3.2  3.4  3.7  4.1  4.6   5.1
2      CEM_eu   30  3.4  3.7  4.1  4.6  5.1  5.6  6.1  6.7  7.5   8.2

Upvotes: 0

At the beginning you should try with pd.df.set_index('prod",inplace=True) after that transponse df with qx

Upvotes: 0

Seshadri
Seshadri

Reputation: 679

I think this would give what you want. I used shift function to first generate additional columns in df2 and then merged with df1.

import pandas as pd

df1 = pd.DataFrame({'prod': ['Winalto_eu', 'Winalto_uc', 'CEM_eu'], 'age' : [28, 25, 30]})
df2 = pd.DataFrame({'age': list(range(25,41)), 'qx': [2.7, 2.8, 2.8, 2.9, 3, 3.2, 3.4, 3.7, 4.1, 4.6, 5.1, 5.6, 6.1, 6.7, 7.5, 8.2]})

for i in range(1,11):
    df2['qx'+str(i)] = df2.qx.shift(-i)

df3 = pd.merge(df1,df2,how = 'left',on = ['age'])

Upvotes: 2

Related Questions