Reputation: 93
I have 3 data frames in my code...i need to combine all 3 into a single dataframe and save it in a excel sheet. Attached the below 3 data frames, and i wanted to created the new data frame based on asset_id.
I tried as below:
import pickle
import os
frames = [a_dataframe, pandaDf, pandaDf1]
result = pd.concat(frames)
It did not work...Can someone pls help me
The final output after merging
final_df=pandaDf.append(pandaDf1, sort=True).merge(a_dataframe, on='asset_id')
final_df
final_df=pandaDf.append(pandaDf1, sort=True).merge(a_dataframe, on='asset_id')
final_df
final_df.head(10).to_dict()
Upvotes: 0
Views: 326
Reputation: 23217
You can append the first 2 dataframes and then merge the 3rd dataframe, as follows:
final_df = df1.append(df2, sort=True).merge(df3, on='asset_id')
Result:
print(final_df)
asset_id property_name property_value revision_id
0 1 HORIZON 24 314
1 1 GRANULARITY 24 314
2 2 HORIZON 48 12
3 2 GRANULARITY 24 12
4 3 HORIZON 24 24
5 3 GRANULARITY 24 24
6 4 HORIZON 24 43
7 4 GRANULARITY 48 43
To further make HORIZON
and GRANULARITY
as column header and put their property_value
under the columns, you can do a pivot
, as follows:
final_df_pivoted = final_df.pivot(index=['asset_id', 'revision_id'], columns='property_name', values='property_value').rename_axis(columns=None).reset_index()
Result:
print(final_df_pivoted)
asset_id revision_id GRANULARITY HORIZON
0 1 314 24 24
1 2 12 24 48
2 3 24 24 24
3 4 43 48 24
Upvotes: 1
Reputation: 24049
try this:
a_dataframe['property_name'] = a_dataframe['property_name'].astype(str)
pandaDf['property_name'] = pandaDf['property_name'].astype(str)
a_dataframe['property_value'] = a_dataframe['property_value'].astype(int)
pandaDf['property_value'] = pandaDf['property_value'].astype(int)
final_df = a_dataframe.merge(pandaDf, on='asset_id', how='left')
final_df = final_df.merge(pandaDf1, on='asset_id', how='left')
in one line you can try this:
final_df = pandaDf1.merge(a_dataframe.merge(pandaDf,on='asset_id'),on='asset_id')
EDIT add code by request:
import pandas as pd
df1 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
'p_value': {0: "HOR", 1: "HOR", 2: "HOR", 3:"HOR"}})
df2 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
'p_value': {0: "GRAN", 1: "GRAN", 2: "GRAN", 3:"GRAN"}})
df3 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
'r_id': {0: "24", 1: "32", 2: "42", 3:"45"}})
final_df = df3.merge(df1.merge(df2, on='id'),on='id')
final_df
output:
id r_id p_value_x p_value_y
0 1 24 HOR GRAN
1 2 32 HOR GRAN
2 3 42 HOR GRAN
3 4 45 HOR GRAN
Upvotes: 3
Reputation: 101
Solution cited above by user1740577 seems to work.
I created directly the three dataframes in python and tried the solution cited and it worked. here's my code. Try to see if it works on your machine. If it doesn't, it might be the way of loading your excels into DataFrame which can cause an error.
import numpy as np
import pandas as pd
import IPython
df0 = pd.DataFrame(data=[[1, 'HORIZON', 24],
[2, 'HORIZON', 48],
[3, 'HORIZON', 24],
[4, 'HORIZON', 24]], columns=['asset_id', 'property_name', 'property_value'])
df1 = pd.DataFrame(data=[[3, 'GRANULARITY', 24],
[4, 'GRANULARITY', 48],
[1, 'GRANULARITY', 24],
[2, 'GRANULARITY', 24]], columns=['asset_id', 'property_name', 'property_value'])
df2 = pd.DataFrame(data=[[4, 24],
[2, 43],
[3, 314],
[1, 12]], columns=['asset_id','revision_id'])
print(df0)
print(df1)
print(df2)
final_df = df0.merge(df1, on='asset_id', how='left')
final_df = final_df.merge(df2, on='asset_id', how='left')
print(final_df)
IPython.embed()
Upvotes: 1
Reputation: 169
The answer is correct, but what your data represent or how do you want to combine the dataframes is not clear. In the first dataframe "asset_id" 1 has "property name" 'HORIZON', but in the second dataframe the same "asset_id" has "property_name" 'GRANULARITY'. Confusing. And not possible to combine the dataframes nicely and logically.
Maybe renaming the columns with distinctive names could help in giving helpful and usable answers.
Upvotes: 0