taga
taga

Reputation: 3885

Join two Pandas DataFrames on specific column with matching values

I want to join two pandas dataframes on "ColA", but the thing is that values in "ColA" in these two dataframes are not in order and dataframes are not the same lenght. I want to join them so that missing values are changed with 0 value, and that values in "ColA" are matching.

df1 = pd.DataFrame({"ColA":["num 1", "num 2", "num 3"],
                   "ColB":[5,6,7]})

print(df1)
df2 = pd.DataFrame({"ColA":["num 2", "num 3","num 1", "num 4"],
                   "ColC":[3,2,1,5]})

print(df2)


    ColA  ColB
0  num 1     5
1  num 2     6
2  num 3     7

    ColA  ColC
0  num 2     3
1  num 3     2
2  num 1     1
3  num 4     5

Result should look like this:

# num1 is matched with appropriate values and num4 has the value 0 for "ColB" 

    ColA  ColB  ColC
0  num 1     5     1
1  num 2     6     3
2  num 3     7     2
3  num 4     0     5

Upvotes: 1

Views: 138

Answers (1)

jezrael
jezrael

Reputation: 862471

Use DataFrame.merge with outer join, convert NaNs to 0 and last if necessary convert dtypes to original by dictionary:

d = df1.dtypes.append(df2.dtypes).to_dict()
df = df1.merge(df2, how='outer', on='ColA').fillna(0).astype(d)
print (df)

    ColA  ColB  ColC
0  num 1     5     1
1  num 2     6     3
2  num 3     7     2
3  num 4     0     5

Or use concat with convert all columns to integers (if possible):

df = (pd.concat([df1.set_index('ColA'), 
                df2.set_index('ColA')], axis=1, sort=True)
        .fillna(0)
        .astype(int)
        .rename_axis('ColA')
        .reset_index())
print (df)
    ColA  ColB  ColC
0  num 1     5     1
1  num 2     6     3
2  num 3     7     2
3  num 4     0     5

Upvotes: 3

Related Questions