haneulkim
haneulkim

Reputation: 4928

Merging multiple dataframes on different columns

Using Pandas 1.2.1

MRE:

df_a = pd.DataFrame({"A":[1,2,3,4], "B":[33, 44, 55, 66]})
df_b = pd.DataFrame({"B":[33, 44,99], "C":["v", "z", "z"]})
df_c = pd.DataFrame({"A":[3,4,77,55], "D":["aa", "bb", "cc", "dd"]})

Using three dfs created above I want to join all of them together however

  1. df_a, df_b share column "B" therefore they join on column "B"
  2. df_a, df_c share column "A" therefore they join on column "A"

I want to left_join df_b and df_c onto df_a. currently this is my method:

merged_df = pd.merge(df_a, df_b, on=["B"], how="left")
merged_df = pd.merge(merged_df, df_c, on=["A"], how="left")

I know works fine however I cannot stop to think there is a easier and faster way, there are multiple questions on joining multiple dfs on same column using reduce function however could not find solution for my question.

Upvotes: 1

Views: 71

Answers (1)

jezrael
jezrael

Reputation: 862581

You can remove on parameter, so it merging by intersection of columns names between DataFrames:

merged_df = pd.merge(df_a, df_b, how="left")
merged_df = pd.merge(merged_df, df_c, how="left")

More dynamic is use reduce, also is removed on parameter:

from functools import reduce
dfList = [df1, df2, df3]
df = reduce(lambda df1,df2: pd.merge(df1,df2,how="left"), dfList)

Upvotes: 3

Related Questions