Reputation: 1658
I'm try to merge several dataframes in loop. My code is :
my_dict = {
'1':{
'providers_using':{
'providers':['prov1','prov2','prov3'],
'calls_count':[10,20,30]
}
},
'2':{
'providers_using':{
'providers':['prov1','prov2','prov3'],
'calls_count':[100,200,300]
}
},
'3':{
'providers_using':{
'providers':['prov1','prov2','prov3'],
'calls_count':[1000,2000,3000]
}
}
}
df_joined = pd.DataFrame(columns = ['providers','calls_count'])
for i in range(1,4):
gw_df = pd.DataFrame.from_dict(my_dict[str(i)]['providers_using'])
gw_df.rename(columns={"calls_count":"gw" + str(i)})
df_joined = pd.merge(df_joined,gw_df,left_on='providers',right_on = 'providers',how='outer')
print(df_joined)
Result is:
calls_count_x providers calls_count_y calls_count_x calls_count_y
0 NaN prov1 10 100 1000
1 NaN prov2 20 200 2000
2 NaN prov3 30 300 3000
But i want get names columns "calls_count" as gw1,gw2,gw3, etc. The count of steps in loop maybe random so it's not acceptable just do :
df_joined.columns = ['gw1','gw2','gw3']
Expected output is:
providers gw1 gw3 gw3
0 prov1 10 100 1000
1 prov2 20 200 2000
2 prov3 30 300 3000
Btw - why columns name are "calls_count_y","calls_count_x" ?
Upvotes: 0
Views: 765
Reputation: 262484
Here is a way to get the expected output from the initial dictionary without looping:
(pd.concat({k: pd.DataFrame(v).T.apply(pd.Series.explode) # unnest the lists
for k,v in my_dict.items()})
.reset_index() # dict keys as column
.pivot(index='providers', # pivot the calls count
columns='level_0',
values='calls_count')
.add_prefix('gw') # rename columns
.rename_axis('', axis=1) # remove 'level_0' name
.reset_index() # index as column
)
output:
providers gw1 gw2 gw3
0 prov1 10 100 1000
1 prov2 20 200 2000
2 prov3 30 300 3000
Upvotes: 1
Reputation: 2019
Below is the trick. You can also rename the columns. The idea is to create a list of columns as per the name gw + str(i)
and create a column list.
col = []
for i in range(1,4):
col.append("gw" + str(i))
gw_df = pd.DataFrame.from_dict(my_dict[str(i)]['providers_using'])
df_joined = pd.merge(df_joined,gw_df,left_on='providers',right_on = 'providers',how='outer')
df_joined.columns = ['providers'] + col
print(df_joined)
Edit : As per the documentation of pandas merge, the suffixes will be _x and _y.
Upvotes: 2