Compare and join two columns in two dataframes

Question

I have two data frames with the same column types.

First Dataframe (df1)

data = [['BTC', 2], ['ETH', 1], ['ADA', 100]]
df1 = pd.DataFrame(data, columns=['Coin', 'Quantity'])

Coin     Quantity
BTC          2
ETH          1
ADA        100
...        ...

Second Dataframe (df2)

data = [['BTC', 50000], ['FTM', 50], ['ETH', 1500], ['LRC', 5], ['ADA', 20]]
df2 = pd.DataFrame(data, columns=['code_name', 'selling rate'])

code_name     selling rate
BTC               50000
FTM                  50
ETH                1500
LRC                   5
ADA                  20
...                 ...

Expected output (FTM and LRC should be removed)

Coin     Quantity     selling rate
BTC          2           50000
ETH          1            1500
ADA        100              20
...        ...             ...

What I have tried

df1.merge(df2, how='outer', left_on=['Coin'], right_on=['code_name'])

df = np.where(df1['Coin'] == df2['code_name'])

Both codes did not give me the expected output. I searched on StackOverflow and couldn't find any helpful answer. Can anyone give a solution or make this question as duplicate if a related question exist?

rossdrucker9 · Accepted Answer

What you need is an inner join, not an outer join. Inner joins only retain records that are common in the two tables you're joining together.

import pandas as pd

# Make the first data frame
df1 = pd.DataFrame({
    'Coin': ['BTC', 'ETH', 'ADA'],
    'Quantity': [2, 1, 100]
})

# Make the second data frame
df2 = pd.DataFrame({
    'code_name': ['BTC', 'FTM', 'ETH', 'LRC', 'ADA'],
    'selling_rate': [50000, 50, 1500, 5, 20]
})

# Merge the data frames via inner join. This only keeps entries that appear in
# both data frames
full_df = df1.merge(df2, how = 'inner', left_on = 'Coin', right_on = 'code_name')

# Drop the duplicate column
full_df = full_df.drop('code_name', axis = 1)

Compare and join two columns in two dataframes

Answers (2)

Related Questions