Reputation: 93

How to combine multiple dataframes in python

I have 3 data frames in my code...i need to combine all 3 into a single dataframe and save it in a excel sheet. Attached the below 3 data frames, and i wanted to created the new data frame based on asset_id.

I tried as below:

import pickle
import os
frames = [a_dataframe, pandaDf, pandaDf1]
result = pd.concat(frames)

It did not work...Can someone pls help me

dataframe1:

dataframe 2:

Dataframe 3:

output:

The final output after merging

final_df=pandaDf.append(pandaDf1, sort=True).merge(a_dataframe, on='asset_id')
final_df


final_df=pandaDf.append(pandaDf1, sort=True).merge(a_dataframe, on='asset_id')
final_df
final_df.head(10).to_dict()

Upvotes: 0

Answers (4)

SeaBean

Reputation: 23217

You can append the first 2 dataframes and then merge the 3rd dataframe, as follows:

final_df = df1.append(df2, sort=True).merge(df3, on='asset_id')

Result:

print(final_df)

   asset_id property_name  property_value  revision_id
0         1       HORIZON              24          314
1         1   GRANULARITY              24          314
2         2       HORIZON              48           12
3         2   GRANULARITY              24           12
4         3       HORIZON              24           24
5         3   GRANULARITY              24           24
6         4       HORIZON              24           43
7         4   GRANULARITY              48           43

Edit

To further make HORIZON and GRANULARITY as column header and put their property_value under the columns, you can do a pivot, as follows:

final_df_pivoted = final_df.pivot(index=['asset_id', 'revision_id'], columns='property_name', values='property_value').rename_axis(columns=None).reset_index()

Result:

print(final_df_pivoted)

   asset_id  revision_id  GRANULARITY  HORIZON
0         1          314           24       24
1         2           12           24       48
2         3           24           24       24
3         4           43           48       24

Upvotes: 1

I'mahdi

Reputation: 24049

try this:

a_dataframe['property_name'] = a_dataframe['property_name'].astype(str)
pandaDf['property_name'] = pandaDf['property_name'].astype(str)

a_dataframe['property_value'] = a_dataframe['property_value'].astype(int)
pandaDf['property_value'] = pandaDf['property_value'].astype(int)

final_df = a_dataframe.merge(pandaDf, on='asset_id', how='left')
final_df = final_df.merge(pandaDf1, on='asset_id', how='left')

in one line you can try this:

final_df = pandaDf1.merge(a_dataframe.merge(pandaDf,on='asset_id'),on='asset_id')

EDIT add code by request:

import pandas as pd
  
df1 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
                   'p_value': {0: "HOR", 1: "HOR", 2: "HOR", 3:"HOR"}})

df2 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
                   'p_value': {0: "GRAN", 1: "GRAN", 2: "GRAN", 3:"GRAN"}})


df3 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
                   'r_id': {0: "24", 1: "32", 2: "42", 3:"45"}})

final_df = df3.merge(df1.merge(df2, on='id'),on='id')
final_df

output:


id  r_id    p_value_x   p_value_y
0   1   24  HOR GRAN
1   2   32  HOR GRAN
2   3   42  HOR GRAN
3   4   45  HOR GRAN

Upvotes: 3

Raklet57

Reputation: 101

Solution cited above by user1740577 seems to work.

I created directly the three dataframes in python and tried the solution cited and it worked. here's my code. Try to see if it works on your machine. If it doesn't, it might be the way of loading your excels into DataFrame which can cause an error.

import numpy as np
import pandas as pd
import IPython

df0 = pd.DataFrame(data=[[1, 'HORIZON', 24], 
                     [2, 'HORIZON', 48], 
                     [3, 'HORIZON', 24],
                     [4, 'HORIZON', 24]], columns=['asset_id', 'property_name', 'property_value'])


df1 = pd.DataFrame(data=[[3, 'GRANULARITY', 24],
                     [4, 'GRANULARITY', 48],
                     [1, 'GRANULARITY', 24],
                     [2, 'GRANULARITY', 24]], columns=['asset_id', 'property_name', 'property_value'])


df2 = pd.DataFrame(data=[[4, 24],
                     [2, 43],
                     [3, 314],
                     [1, 12]], columns=['asset_id','revision_id'])


print(df0)
print(df1)
print(df2)

final_df = df0.merge(df1, on='asset_id', how='left')
final_df = final_df.merge(df2, on='asset_id', how='left')

print(final_df)

IPython.embed()

Upvotes: 1

Esa Tuulari

Reputation: 169

The answer is correct, but what your data represent or how do you want to combine the dataframes is not clear. In the first dataframe "asset_id" 1 has "property name" 'HORIZON', but in the second dataframe the same "asset_id" has "property_name" 'GRANULARITY'. Confusing. And not possible to combine the dataframes nicely and logically.

Maybe renaming the columns with distinctive names could help in giving helpful and usable answers.

Upvotes: 0

How to combine multiple dataframes in python

Answers (4)

Edit

Related Questions