Reputation: 409
I have two dataframes df1, df2 described below
df1
prod age
0 Winalto_eu 28
1 Winalto_uc 25
2 CEM_eu 30
df2
age qx
0 25 2.7
1 26 2.8
2 27 2.8
3 28 2.9
4 29 3.0
5 30 3.2
6 31 3.4
7 32 3.7
8 33 4.1
9 34 4.6
10 35 5.1
11 36 5.6
12 37 6.1
13 38 6.7
14 39 7.5
15 40 8.2
I would like to add new columns with a for loop to df1. The names of the new colums should be qx1, qx2,...qx10
for i in range(0,10):
df1['qx'+str(i)]
The values of qx1 should be affected by the loop, doing a kind of vlookup on the age : For instance on the first row, for the prod 'Winalto_eu', the value of qx1 should be the value of df2['qx'] at the age of 28+1, qx2 the same at 28+2...
The target dataframe should look like this :
prod age qx1 qx2 qx3 qx4 qx5 qx6 qx7 qx8 qx9 qx10
0 Winalto_eu 28 3.0 3.2 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7
1 Winalto_uc 25 2.8 2.8 2.9 3.0 3.2 3.4 3.7 4.1 4.6 5.1
2 CEM_eu 30 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7 7.5 8.2
Have you any idea ? Thanks
Upvotes: 1
Views: 2098
Reputation: 13387
Try:
df=df1.assign(key=0).merge(df2.assign(key=0), on="key", suffixes=["", "_y"]).query("age<age_y").drop(["key"], axis=1)
df["q"]=df.groupby("prod")["age_y"].rank()
#to keep only 10 positions for each
df=df.loc[df["q"]<=10]
df=df.pivot_table(index=["prod", "age"], columns="q", values="qx")
df.columns=[f"qx{col:0.0f}" for col in df.columns]
df=df.reset_index()
Output:
prod age qx1 qx2 qx3 ... qx6 qx7 qx8 qx9 qx10
0 CEM_eu 30 3.4 3.7 4.1 ... 5.6 6.1 6.7 7.5 8.2
1 Winalto_eu 28 3.0 3.2 3.4 ... 4.6 5.1 5.6 6.1 6.7
2 Winalto_uc 25 2.8 2.8 2.9 ... 3.4 3.7 4.1 4.6 5.1
Upvotes: 0
Reputation: 5955
Ridiculously overengineered solution:
pd.concat([df1,pd.DataFrame(columns=['qx'+str(i) for i in range(11)],
data=[ser1.T.loc[:,i:i+10].values.flatten().tolist()
for i in df1['age']])],
axis=1)
prod age qx0 qx1 qx2 qx3 qx4 qx5 qx6 qx7 qx8 qx9 qx10
0 Winalto_eu 28 2.9 3.0 3.2 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7
1 Winalto_uc 25 2.7 2.8 2.8 2.9 3.0 3.2 3.4 3.7 4.1 4.6 5.1
2 CEM_eu 30 3.2 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7 7.5 8.2
Upvotes: 0
Reputation: 21709
Here's a way using .loc
filtering the data:
top_n = 10
values = [df2.loc[df2['age'].gt(x),'qx'].iloc[:top_n].tolist() for x in df1['age']]
coln = ['qx'+str(x) for x in range(1,11)]
df1[coln] = pd.DataFrame(values)
prod age qx1 qx2 qx3 qx4 qx5 qx6 qx7 qx8 qx9 qx10
0 Winalto_eu 28 3.0 3.2 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7
1 Winalto_uc 25 2.8 2.8 2.9 3.0 3.2 3.4 3.7 4.1 4.6 5.1
2 CEM_eu 30 3.4 3.7 4.1 4.6 5.1 5.6 6.1 6.7 7.5 8.2
Upvotes: 0
Reputation: 1
At the beginning you should try with pd.df.set_index('prod",inplace=True) after that transponse df with qx
Upvotes: 0
Reputation: 679
I think this would give what you want. I used shift function to first generate additional columns in df2 and then merged with df1.
import pandas as pd
df1 = pd.DataFrame({'prod': ['Winalto_eu', 'Winalto_uc', 'CEM_eu'], 'age' : [28, 25, 30]})
df2 = pd.DataFrame({'age': list(range(25,41)), 'qx': [2.7, 2.8, 2.8, 2.9, 3, 3.2, 3.4, 3.7, 4.1, 4.6, 5.1, 5.6, 6.1, 6.7, 7.5, 8.2]})
for i in range(1,11):
df2['qx'+str(i)] = df2.qx.shift(-i)
df3 = pd.merge(df1,df2,how = 'left',on = ['age'])
Upvotes: 2